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IS THE RATING OF HUMAN CHARACTER 
PRACTICABLE? 


HAROLD RUGG 


The Lincoln School of Teachers College, and 
Teachers College, Columbia University 


Can HwtMAN CHARACTER BE “‘RATED”’ ON Pornt Scaues Accvu- 
RATELY ENouGH FOR PractTicaL Uses IN EDUCATION? 


YrEs, anD—No 


Yes,—if the rating is done under conditions as rigorous as 
the following: 

« First, if each final rating given a person is the average of 
three independent ratings, each one made on a scale as objectified 
as the man-to-man-comparison type of scale. 
~ Second, if the scales on which the ratings are made are com- 
parable and equivalent,! having been made in conferences under 
the instruction of one skilled in rating scale work. 

~ Third, if the three raters are so thoroughly acquainted with 
the person rated that they are competent to rate. 

But these conditions are practically unattainable in public 
schools. Hence the answer to our original question—No, not by 
methods so far generally employed, and probably not unless 
methods of rerating and checking judgments are carried far 
beyond present practical possibilities. 

We can now predict that, on a scale of 100 points, the probable 
error of the best single rating that we can get under ‘“‘experimental”’ 
conditions is between 5 and 6 points. Thus a large proportion of even 
“experimental” ratings will locate persons outside his true “fifth” 
of the entire scale. And I assume that to locate a person within his 
proper ‘‘fifth” of the entire scale is a sound and practical working 


1 “Comparable and equivalent” explained in detail later in the article. 
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criterion for rating. Furthermore, we can predict that a single rating 
by a typical school officer—supervisor, superintendent, principal—will 
only rarely locate a person within his proper “‘fifth”’ of the entire scale. 
And we can predict that a single rating on any one of the commonly 
used types of scales,—like Elliott’s or Boyce’s or Beatty’s or Hill’s— 
will have a probable error of at least 10 points on a scale of 100 points 
and hence be practically valueless. 

Hence the apparently dogmatic answer to the question—‘‘ Can 
human character be “rated” on point scales accurately enough for 
practical uses in Education’”—No. We would far better give our 
energies to the attempt to measure it objectively, than to make sub- 
jective judgments of it on point scales. The point cannot be made 
too emphatically that we should discard these loose methods of 
rating once and for all. We cannot justify wasting the time of our 
school administrators and deluding our teachers with fictitious “rat- 
ings” and “marks.” Even on one of the so-called “standardized” 
point rating schemes a single rating has little or no scientific validity. 

I propose to publish in these articles detailed evidence to aid us 
in setting straight our thinking about this matter. This evidence was 
collected under rare conditions—conditions that we may never be 
able to duplicate—certainly not unless we have another great war. 

Let us refresh our memories a bit about educational rating scales. 
The movement is about eleven years old. Elliott made the first 
suggestion in 1910 with a very elaborate scheme of some hundred 
traits to which were assigned “weights” or “credits.” A group of 
traits like “‘dynamic efficiency ’”’ was given 80 points and each of several 
traits contributing to it was likewise assigned points—®5 or 25, as the 
case might be. In this way a person was rated on a scale of 100 points. 
The weighting of the separate elements was entirely arbitrary, 

I was in a group of about a dozen trained school officers in 1911 
which used the Elliott scale. We rated the same group of ten teachers, 
sometimes several observers observing a teacher simultaneously. Of 
several hundred correlations made up from such ratings, practically 
no correlations exceeded 0.2. Many were negative. The ——_ 
was clear that a subjective rating scheme, made up of an elaborate sét 
of abbreviated descriptions of traits, with weights arbitrarily assigned, 
and with rating done against no external standard, had little or no 
scientific validity. Yet the Elliott scale did much good in stimulating 
thought along the line of “How shall we measure these intangible and 
dynamic qualities?” 
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The Boyce scheme was little better,—forty-five ‘‘qualities,”’ 
described very briefly and with indefinite captions, rating in 10 divisions 
but against no common external standard. Rating was thoroughly 
subjective,—nothing external to the rater’s consciousness against 
which to measure the thing rated. Comparisons of independent 
ratings on this scheme showed insufficient agreement. (Boyce’s 
own data were so meagre as to be inconclusive.) 

Other attempts were made but always of the same subjective type. 

The first innovation and one promise of real progress was the 
man-to-man-comparison scale. This was the product of a seminar 
discussion at the Carnegie Institute of Technology—a class conducted 
by Professor Walter Dill Scott. The suggestion was applied in the 
development of rating scales for employees in industry. Here at last, 
it was said, was a method of judgment which took the process of rating 
out of the realm of the “subjective” and gave it scientific standing— 
made rating “objective.”’ The other scales had employed no common 
standard. The man judged was rated against nothing external. 
Measurement—ran the dictum,—implied comparison with a scale. The 
conclusion followed—make your “rating scale’ a scale of human 
beings. The increments—the unit distances on your scale—will be 
the distances or differences between “‘scale men.’”’ How select these 
scale men? First, by choosing “‘the best man you ever knew” and 
writing his name down at the top point on your rating scale. Second, 
by selecting ‘‘the poorest man you ever knew” and using him at the 
foot of the scale. Similarly, you chose an “average-man,”’ a ““better- 
than-average-man,” and a “poorer-than-average-man,”’ making a five 
“point” scale. You gave numbers to the five men, like 15, 12, 9,6, and 
3, and your “‘scale” was complete. 

Rating was simple on it. You simply compared your man with 
these five men. Was he as good as the best man,—say, Richardson? 
No. Was he better than Johnson, the poorest man? Yes,—by far. 
Was he superior to Bankson,the “average” man? Well,hardly. And 
so it went until he was finally placed on the scale and given a score. 

The scheme was ingenious—a new and unique suggestion. It 
attracted attention from personnel managers in industry. But its 
bases and implications were not thought through. Its validity and 
reliability were not determined experimentally. 

Just at this point, 1917, America entered the war, and the educa- 
tional and psychological brains of the country were harnessed into two 
remarkable working teams—the Psychological Division of the Army 
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under Professors Yerkes, Terman and others, and the Committee on 
Classification of Personnel under Professors Scott, Bingham, Strong, 
Coss, and others. 

Two important attempts to measure human abilities were made by 
the respective teams—the Army A, Alpha and Beta—group intelligence 
tests by the Psychological Division; and the Army Rating Scale for 
the rating of officers’ efficiency by the Committee on Classification of 
Personnel. 

The rating scale introduced was the man-to-man-comparison 
scale to which I have referred. Careful canvass of all the available 
scales was convincing of the fact that this scale was the most objective 
and gave premise of the greatest validity and reliability. But practi- 
cally no scientific evidence was at hand from which to establish its true 
reliability. Neither were tested methods of constructing and using scales. 
So it was necessary to introduce it into the army without this scientific 
and basic evidence. Great credit should be given to Colonel Scott and 
his colleagues for their skill and patience in the face of opposition in 
finally securing a trial of this important suggested method of judging 
character in Army officers. 

The use of the Army Rating Scale during the winter, spring and 
summer of 1918, raised grave questions in the Committee as to its 
validity and reliability. Preliminary experiments with it in certain 
camps tended to confirm the suspicion of its unreliability. 

On being brought into the service of the Committee on Classifica- 
tion of Personnel, as Statistician, in September, 1918, I was commis- 
sioned to make an intensive analysis of the construction, use and 
reliability of the Army Rating Scale. For nearly three months this 
occupied the time of a staff of six statistical and elericaf workers under 
my direction.! 

Conditions were set up under the stress of a great war that it 
would be difficult if not totally impossible to duplicate in peace times. 
The remarkable conditions under which this investigation was carried 
on cannot be stressed too strongly. I doubt if we shall ever have 
again—certainly not unless we are again thrown into a great war—the 
opportunity to duplicate them. 

Imagine an experimental situation in which groups, each of nearly 
100 very intelligent officers (a total of 461, with an average alpha 


1Great credit should be given to Miss Cecile Colloton, my present research 


assistant, for her intelligent and thorough work in charge of the statistical 
treatment of the data. 
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score of B+), who had lived together for 11 to 14 months in depot 
brigade; who had associated constantly —slept, dined, drilled, worked 
and played together, until they knew each other’s personal character- 
istics as only can very intelligent, observant men who are literally 
bound together. Conditions would have to be set up as ideal as those 
of an Arctic exploring expedition to provide a better situation for the 
experiment. I say, imagine the remarkable circumstances in which 
whole groups of such men could be brought together willingly, gladly, 
for three day conferences on the rating scale; in which with meticulous 
care they would laboriously and immediately under supervision con- 
struct rating scales by our most refined technique; in which they would 
use each other for scale men on their scales; in which they would rate 
each other on these scales, and finally, signing their own names, give us 
their scales for scientific comparison. 

Certainly no public educational situation in America can duplicate 
in our generation, experimental conditions so favorable to the construc- 
tion and critique of rating scales. And the critical work was done on 
what is clearly the most objectified of all the scales suggested to the 
present time. Hence the importance of the data presented in this 
report. 

EVIDENCE FROM THE INVESTIGATION OF THE 
Army RaTING SCALE 


The evidence for these introductory comments will be presented 
systematically. I reproduce first three samples of the man-to-man- 
comparison type of scale: A. The Army Rating Scale; B. The Rugg 
Rating Scale for Students; C. The Rugg Rating Scale for Teachers. 

The problem before us is this: How closely is a single rating of a per- 
son’s qualities made on the man-to-man scale a true measure of his qual- 
ities? Throughout the remainder of this article and the next one, the 
data and discussion will refer altogether to the Army Scale and the rating 
of the abilities of officers in the United States Army in 1917 and 1918. 
Analogies and applications to education will be drawn constantly. 

Selection of Criteria for Judging Validity of Ratings—The most 
difficult and important task we encountered at the outset of the 
investigation was the selection of criteria against which to measure 
the validity of ratings of character. Four were finally selected: 

The determination of: =~ 

1. The degree to which a number of officers agree in rating the same 
officer independently, both in total rating and on specific contributory 
traits. 
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A. Tue Army RatTina ScALE 





I. PHYSICAL QUALITIES. 
Physique, bearing, neatness, 
voice, energy, endurance. 

Consider how he impresses his 
command in these respects. 





I CL Gula Na v4 eK ead a uate oa 15 
Be hate saachinicaad a0 tiles Masks 12 
CRG add iat aeons eR Nae eA 9 
SN AAbb has Sk any he x siewh eae ek ee 6 














ES eho ku ote ald b ired achhbereanwath ts 3 
II. INTELLIGENCE. DE dda bo akide lobe ccs Pens 15 
Accuracy, ease in learning; ability 
to grasp quickly the point of view | High........................... 12 
of commanding officer, to issue 
clear and intelligent orders,. to | Middle......................... 9 
estimate a new situation, and to 
arrive at a sensible decision in a | Low....................000000e: 6 
crisis. 
I ita Pi ae aaa kaon miele Cig ad 3 
III. LEADERSHIP. Highest 15 
Initiative, force, self reliance, 
decisiveness, tact, ability to | High.................cccccecee: 12 
inspire men and to command 
their obedience, loyalty and co- | Middle......................... 9 
operation. 
NES ns pia od bid a5 PAAR SD wliktae 6 
I Ee Cet ee 3 
IV. PERSONAL QUALITIES. its nei tindiae Catekeelts 15 
Industry, dependability, loyalty; 
readiness to shoulder responsi- | High........................... 12 
bility for his own acts; freedom 
from conceit and _ selfishness; | Middle......................... if) 
readiness and ability to co- 
operate. NST tir em OPER PE ep ere Tet 6 
EE ne ee eT te 3 
V. GENERAL VALUE TO THE PNG 3 ia visas utc Sacis alauivls lek 40 
SERVICE. 
Professional knowledge, skill and | High........................... 32 
experience; success as adminis- 
trator and instructor; ability to | Middle......................... 24 
get results. 
BE 63s ibeccd owen eed sakmaade 16 
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The Rating scale containing the names of typical students who can be compared 


with the student to be rated 


(Primarily for teachers and principals to give the students a numerical rating) 





I. Ability to learn—to assimilate new ideas* 


ESOL CORTE LE, 38 


Better than average.................. 30 
ite irhead eens dk oeuias cvse we eee MME Ua sgh da t6e dale ce obese ek wae dee.4 s 
Poorer than average..............00.. 14 
ES coheed ek sa denesdesdess 6 





Summary numerical rating 








II. Qualities of industry and attitude toward school work 















































EEE AP eae eee re I ile aie kis Kuciia ec, Lk ee tien wis x ae a eo w wi aon a 
Better than average.................. Tee oe eres biue an wah toe % ate bale eeun 
pS EET EES Se oe re ee SRT aR tet 2d ab tet Ae Se Oi ee edie ale wie 
Poorer than average..................| ee! Ses ce eae ka ebe OP Uv seas ode Coens hee 
os ob wink wGSie cava ddlavc SED i Pete a. sala alc duicdibe bein Ke uiawe waked 
a a OL ee fe te ale gan bd eRe bie aes ae ace k Keine 
III. Qualities of leadership 
SE SOS, SO ee a a EE ee ER en ne, Pe eT eee 
Better than average.................. ee Ee Gh tee ce ee ls oeac eae ck ce teen OU éeL Uwe 
SC CLS Re a eo ne Nee Fe OR i i se i ie als ae be ee 
Poorer than average.................. ee 2s eae a oes xed Bae e C4 db bowel see ke 
ES a eee ES RSA RE oe ey Ie a ee 
IE ER OR ae AT SO a, a ES, OF Eee na ae a, SE 
IV. Team-work qualities 
DS EE | I) EE ar rarer, ap eet Bae pre a 
Better than average.................-. BN ky vacin cad h soaks ade eoews see 
EEE Er ae es ae gis og utes LM pote EIS oil a wate «6h. abd CAMA ae 
Poorer than average................6: i ee BRE Pye | D1 LA eee ee Pea APS ene, 
SA ae | EE Be ie bne wine a6 ott ate CORA inaia ad & x owl ties 
eo a dk il bee Ci ee SSW OAs die Wb 6 h.eian abd we we lta dare mia’ le bpheln ieee 
V. Personal and social qualities 
cnnetgnsititaillatias ——— inaniedaets . 
SINE « wepainss 2000 00 0000s «00 RG? Dh diame id he ne Hkh CRM ON S400. +0 vba hares 
Better than average.................. TE SPAN SRG ROR ae pe Rg eR A ee ee 
is Cee an wba ome 6e sane 600-68 Rte? ECU tsk swe ie cede 6 6M pele ee be k eo ewak eas 
Poorer than average................../ 14 AE Ads se Santdh dG kebab bess caukehdbhbbee 
PES a 6 TUS ares edt Gils Aa wd ale Oe 6a eae 6 6 owe 





Summary numerical rating 
Total numerical rating 


eosreseeeseeseeeee sen ee eeeeeeeseeseeseeeeeeeeseeer esses eeeeeeeee eee eeees 


@eeeeseneceeesesseeeererese eases eeeeeseseseeseeeeeeeseeseeeeseeseeeeseeeeese 





*Some forty odd questions are answered about the pupil’s tracts. These serve 
as complete definitions of the qualities. Copies of the two Rugg Rating Scales 
can be secured from the University of Chicago Bookstore, 5802 Ellis Ave., Chicago. 
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2. The degree to which officers’ scales are comparable and represent 
equivalent amounts of the traits in question—personal qualities, 
physical qualities, intelligence, leadership, and general value to the 
service. (In place of these read for analogy to education the separate 
qualities on my scales.) 

3. The degree to which the scale positions of officers used on the 
“‘TIntelligence”’ element of the Rating Scale correspond to scale 
positions determined by three objective psychological tests. 

4. The degree to which the Rating Scale detects differences in 
ability which are detected by other conspicuous measures of success. 
The most important was: appointment to a captaincy from civil life 
without having had previous military experience. 

Four types of data were secured in the investigation: 

1. More than 100,000 “official” quarterly ratings. (Once every 
three months each officer from colonel down in each army unit was 
rated on the Rating Scale by his immediate superior officer. The 
spring, summer and autumn ratings, 1918, were treated_ inde- 
pendently.) : 

2. Ratings obtained in September, 1918, at an officers’ personnel 
school, conducted at Fort Sheridan by Lieutenant Colonel J. J. Coss. 

3. Scales, ratings, re-ratings, and detailed personal data from the 
two groups of officers, 461 in all, who cooperated in an experimental 
study of the rating scale at Camp Sheridan and Camp Zachary Taylor. 

4. Correlation and other data, of ratings and of other measures 
of success in the army, such as previous annual earnings, promotions, 
years of schooling, age and scores made upon psychological and alert- 
ness tests. 

The rating of officers in the Army parallels in its practical features 
the rating of school teachers and students. It is important to stress 
the great emphasis on the practical needs of rating and to caution the 
reader that refinement in rating was not expected. The one criterion 
that was constantly uppermost in our minds was: Is the probability 
great that a rating given an officer by the use of the Army Rating 
Scale will locate him in that fifth of the entire scale in which he would 
be placed by an objective measure of success, if such could be found? 
Thus, it was assumed that the Army wished to discriminate officer 
ability with no greater degree of refinement than that which would 
classify all officers in five groups. This likewise is typical of our 
school situations. For convenience we may think of these as falling 
into the following numerical intervals on the scale, and as being 
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described by some such suggested code symbols as: A = 84-100, 
B = 68-83.9, C = 52-67.9., D = 36-51.9., E = 20-35.9. 

Thus in our analysis we will keep constantly in mind that approzi- 
mate accuracy is all that is demanded or that could be secured, under 
actual working conditions, either in the army or in the public schools. 

Assumptions Implicit in the Man-to-man-comparison Scale.—The 
construction and use of the Army Rating Scale rested upon three 
fundamental assumptions: 

1. That the scales made by various officers will be comparable 
and equivalent with respect to the absolute amounts of trait repre- 
sented upon them. More concretely Captain Evans’ scale for 
“Tntelligence’” will be comparable to Captain Brown’s Seale for 
“Intelligence” in that the two ‘‘15” or “‘highest’’ men will represent 
approximately the same degree of the trait; similarly for the two 
“3” or lowest men, the “12 men, the “9” men and the “6” men. 
It should be pointed out that this assumption will be implicit in the 
construction of oy rating scale, the purpose of which is to lead to an 
objective andbsolute measure of an officer. , 

2. But this in turn points to the second assumption; namely, that 
officers of varying grades of ability are so distributed throughout the 
various units of the army (in camps, cantonments, schools, etc.) that 
each rating officer will have represented on his original list, and on his 
scale approximately the same differentiation and distribution of the trait 
in quntion. More concretely, that the spread and absolute amounts 
of the “ physical qualities,” say, on one officer’s scale will be approxi- 
mately equivalent to the spread and amounts on another officer’s 
scale and that the intervals on the scale represent approximately the 
same differences in amount of the trait. ja 

3. Finally, the assumption is ‘implicit that army officers can be 
trained to evaluate the abilities of their associates and subordinates 
sufficiently to construct scales which will be comparable, and to make 
the ‘‘man-to-man-comparison”’ required for the rating of their 

subordinates. 7 
Objective Measures of Success—As in education, so in the army, 
very few adequate measures of success were available. For example, 
we had no measure of success in overseas fighting. To make possible 
a complete analysis of the validity of the Rating Scale a complete 
study should have been made of the relation between success in active 
service and all other facts which were available on officers: (1) their 
ratings on the rating scale, obtained both during the period of train- 
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ing and in overseas service; (2) their citations and special rewards 
of merit or demerit; (3).their promotions; (4) their previous annual 
earnings in civil life, years of schooling, age, etc.; (5) the degree to 
which they had exercised responsible control of men and of policies 
in previous civil life activities; (6) their preference for a particular 
branch of service and their success in it; etc. Had the war continued || 
it is likely that this study would have been made. \ 

We finally isolated four objective measures of success: 1. The 
first was found in the abilities of men conspicuously selected to officer 
the army. Officers were appointed from training camps to various 
comniissions, some to the rank of 2d lieutenant; some to that of 1st 
lieutenant; and still others at once to captaincies. It was confidently 
assumed that the men who were appointed to captaincies at once from 
civil life, without having had previous military experience, combined in 
an outstanding way the qualities demanded for success in the Army. 
Thus we implied that any measuring device used on officers ought to 
measure men in the long run in approximately the same way that 
appointment to a captaincy from civil life measures them. Through- 
out this study, this criterion was regarded as an important one in 
checking up the measuring power of the Rating Scale. Furthermore 
it contributed to an analysis of other measures of success in the army. 

2. The second objective measure of success in an officer is to be 
found in his achievements in carefully conducted psychological tests. 
During the previous six months evidence had accumulated’ rapidly 
that the army psychological tests measured quite closely the types 
of ability demanded for success in officers in the army. Reports 
issued about that time by the Division of Psychology, e.g. (1) ‘Army 
Mental Tests,” (2) Reports numbers 26, 27 and 28 of the Psycho- 
logical Board at Camp Wadsworth, South | Carolina, offered a number 
of concrete illustrations of this fact. 

In correlating achievements on the army psychological tests with 
ratings made upon the Army Rating Scale, it was recognized that the 
two instruments do not measure the same identical group of abilities. 
Lack of correlation is expected because of the fact that some Of the 
qualities included in the Rating Scale (even in the “Intelligence” 
part of the Rating Scale) do not coincide with those involved in 
performance on the psychological test. Expressed in statistical terms 
we may say that there should be a reasonably “high corrélation” 
between psychological test scores and ratings, or any other measure 
of success. A “high correlation,’ represented conservatively by an 
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“r” of 0.5 to 0.6, means that there is a very distinct tendency for 
differences | in ability to be similarly detected by the two instruments, 
3. Asa third measure of success we canvassed the relation between 
ratings, and the typical qualification facts which were available on 
officers; their previous annual earnings; their appointment to commis- 
sions of various grades, their ratings, their promotions and their 
previous occupational activities. It was recognized that at best, 
promotions and “earnings” could be regarded as only partial measures 
of success in the army. Promotions, for example, were contributed 
to by so many factors other than that of military merit, that the 
“promotion interval” from say, appointment to ‘2d Lieutenancy”’ 
and subsequently to a captaincy must be regarded only asa very 
coarse measure of achievement. These measures were regarded only 
as supplementary. o 
4. With the foregoing measures of effectiveness ‘of the Rating 
Scale there was followed the practical criterion that independent ratings 
to be valid measures of officers _ should s! show a very limited amount 
of f Variability: This criterion led to the chief statistical method of 
treating the data—the determination of the variability of independent 
ratings on officers _by different raters. 


How C.uiose_ty Do INDEPENDENT RATINGS OF 
CHARACTER AGREE? 


Several elaborate sets of data were collected to answer this ques- 
tion: 

1. The “‘official” quarterly ratings, over 100,000 in number. 

2. The ratings of 325 men in an officers personnel school. 

3. The “experimental”’ scales and ratings of each other obtained 
from 461 officers. 

1. The Study of Two or More Official Ratings as an Officer. —The 
first step was to compare two and more official ratings made on an 
officer, both by the same rater, and by different raters. 2383 cases 
were tabulated; (the spring ratings had already been proven of little 
value so the comparison was made for summer and autumn ratings). 
For these average differences between the two ratings (on a total 


scale of 80 points it should be remembered) A—by the same rater 
were: 


For Second Lieutenants.................. 10.2 points 
For First Lieutenants. . aa ke fib ow HORE 10.2 points 
ng CE aap rr 8.4 points 
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B. For different raters, the average differences between the two 
ratings were: 


For Second Lieutenants.................. 12.0_points 
For First Lieutenants.................... 21.7 points 
Ri ge RR ee na ara 16.9 points 


About half of the differences were increases and half decreases. 
The medians were somewhat different, but the general conclusion was 
inescapable: 7 was very improbable that an officer was located within 
even his proper ‘‘fifth’’ of the entire scale by an “official” rating. The 
ratings were practically valueless. Either the Rating Scale was 
being improperly used or else the task of constructing the scale and 
of making the man-to-man comparisons that are necessary is too 
difficult and complicated to be compassed under the practical limita- 
tions of army rating. (And these rating conditions are quite compara- 
ble, if not superior, to those of education—certainly as to education 
and experience of raters, administrative control over rating and the 
like.) 

Evidence was accumulating that the scale was being improperly 
used. In fact it was known that at first and second quarterly rating, 
thousands of officers were ‘‘rated’”’ without the use of a scale at all. 
We had evidence to show, however, that vast improvements were 
made in the October ratings. The net result of the study of These 
“official”? ratings was to make clear the need for the study of agree- 
ments in rating a person when the whole procedure_of. constructing 
and using scales is definitely controlled. 

I would caution the reader that in only a limited sense do the 
sweeping conclusions stated at the beginning of this article rest upon 
the analysis of these “official”? ratings. They are impressive only 
as illustrations of the results that would likely follow from the utili- 
zation of such a rating scale (as this 1918 official scale was) in most 
school systems today. But my insistence that a single rating of charac- 
ter on any judgment scale is invalid and of little practical value rests 
upon far better evidence than was obtained from the studyof these 
official ratings. We turn next to data distinctly better than those 
studied so far. etsy 

2. Differences in 6 to 31 independent ratings on the same officer 
by associates in an officer personnel School. I quote directly from 
Professor Coss’s account of the way he_secured the data of this part 
of our investigation: 
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‘325 college men averaging about 20% years of age, who had been 
Training Camp, were given a course in the work of the Personnel 
Adjutant. These men were rated by the Rating Scale and were given 
an intelligence rating. aa 

“The officers who had instructed this group in military branches 
were called together for an evening meeting in a small ballroom of a 
nearby hotel and were instructed in the use of the scale. The instruc- 
tion was given under particularly favorable circumstances, the 
attention was excellent, and the appreciation of the points to be noted 
seemed general. These officers then individually made a rating 
scale using second lieutenants as a base. They were then asked to 
get together by groups on the basis of the companies to which they 
had been attached. They were given lists of the men from their 
companies and were asked to rate each man. They worked with 
interest and carefully. The Company groups turned in ratings 
both from the individual members and from the average of their 
grades. 2 

“One of the exercises of the school was the study of the Rating 
Seale. The scale was explained. The soldiers were then required 
to read from the printed matter on the scale and made a scale using 
second lieutenants as a base. They then rated each other. Each 
man from every company rated all the other members of that company 
and turned in his rating sheet. Captain Trabue of the Surgeon 
General’s office conducted the intelligence test examination in which 
the men took a lively interest. The intelligence rating of the group 
was extraordinarily high.” 

Table I supplies the findings for the first 15 men in company I. 
This is a thoroughly random sample of all the data. The typical 
range of ratings on an officer, 30 to 25, shows that any one rating 
selected at random may be a very unreliable estimate of an officer’s 
true rating. He would be displaced by two whole—eVen three whole— 
divisions of the scale. Some of his associates called him ‘‘excellent”’ 
officer material, while others rated him as distinctly poor. We had 
many instances in which an officer was rated in the top fifth by one 
associate and in the bottom fifth by another. 

Note the striking constancy of the standard deviations around 8 
and of the probable error between 5 and 6. The conclusion is clear 
that for probabilities of 20 to 1 we are assured that an officer will be 
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TaBLE ].—AVERAGES AND MEASURES OF VARIABILITY OF 6-31 INDEPENDENT 
RatTiInes ON 15 OFFICERS IN A PERSONNEL ScHOOL aT Fort SHERIDAN 


(These 15 are typical of entire group of 325) 



































Average 
hed vss Sa a wi His aver- | deviation | Standard ; Probable 
is ia BS age rating | of ratings | deviation error 
rated him him sn hn | 
1 27 52-80 65.7 6.1 8.42 5.67 
2 23 38-67 52.9 6.7 8.11 5.47 
3 27 66-92 80.9 6.4 7.61 5.13 
4 30 36-73 53.5 6.4 8.50 5.73 
5 19 53-81 63.8 5.4 7.10 4.79 
6 31 48-83 64.4 5.8 7.52 5.07 
7 31 43-77 62.2 4.6 6.23 4.20 
8 28 43-71 56.6 5.9 7.16 4.83 
9 23 39-75 55.2 Pe 10.00 6.74 
10 27 32-65 48.3 6.4 8.52 5.75 
11 27 46.74 59.4 6.6 7.83 5.28 
12 29 48-89 75.1 8.2 10.34 6.97 
13: .): 0 37-70 54.0 5.9 7.46 5.03 
14 25 37-66 54.1 6.2 773.1 6m 
15 25 43-82 61.9 7.3 9.23 6.23 














correctly placed only within a range of about 15 to 18 points, more 
than one “‘fifth’’ of the scale itself. | 


(To be continued.) 


({n the December installment will be printed the data secured in 
the experimental studies at Camps Sheridan and Taylor, and part of 
the analysis of the psychological factors involved.) 











SUBJECTIVE TESTS VS. OBJECTIVE TESTS 


A. A. ROBACK 
Harvard University 


It would seem more of an anachronism for any psychologist to 
advocate at this advanced stage of the testing game the introduction 
of tests which require a certain amount of interpretative ability on the 
part of the examinee and considerable judgment on the part of the 
tester. Mental testing was, of course, bound in the direction of an 
objective goal for various reasons, chiefly, however, on account of the 
needed uniformity in the scoring which would involve possibly millions 
of cases. Absolute standardization of the scoring directions and com- 
putations was the desideratum of the intelligence test movement. 
Very rarely a subjective test would spring up such as the interpretation 
of fables and the description of pictures; but in general, intelligence 
tests grew more machine-like or call it objective, if you like, from year 
to year, until the “multiple choice” sort of test has become most 
popular both with examiners and examinees, and with good cause; for 
both had an easier task before them under such conditions than when 
the answer was not suggested on the examination. The value of such 
tests was indeed manifest at the time of the war when the stupendous 
number of men examined would have made it quite impossible to 
carry out the examinations with the dispatch and efficiency shown by 
the corps of intelligence testers engaged. In drawing the line between 
a feeble-minded person and one of normal intelligence such tests will 
doubtless furnish us with considerable diagnostic information, but it is 
questionable whether a method where the correct answers are supplied 
together with several possible but incorrect answers could give us an 
insight into the mental caliber of the individual examined or allow for 
the numerous variations to be expected in a large group. 

Objective tests are satisfactory only in mathematical and mechan- 
ical problems, where only one solution is possible; and even there, 
one may long to take into consideration the different modes of approach 
discoverable in a group. In many other objective tests, however, 
there are decided disadvantages, and mental testers are only deceiving 
themselves when they suppose that the merit of a test is to be adjudged 
on the basis of the scorer’s time and energy it saves. It is undeniably 
true that those tests in which a number of possibilities are put before 
the individual who is to under score or appose a cross to the correct 
answer—tests which I should designate by tlfe name of ‘multiple 
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choice’”’ problems—are the most economical, but when we stop to 
analyse the situation as the examinees are confronted with the problem, 
can we assert with any measure of certainty that they are thinking out 
the solution? In tests of this kind it seems to me we are dealing with 
factors which may be regarded as components of intelligence, but 
which certainly cannot present themselves as the characteristic marks 
of mentality. To analyze a hypothetical instance, suppose we ask 
our examinee to underscore one of the following reasons for going to 
school: 

1. to get an education; 

2. to earn more money later on; 

3. to provide teachers with work; 

4. to have a good time; 

5. to become useful citizens. 

It is evident that a bright individual who would respond toa natural 
query of this sort in an off-hand, yet correct manner, might become 
distracted by, if not bewildered at, the absurd possibilities offered and 
in consequence fumble about before making his final reaction. It 
would appear that the more direct and original a person is, the more 
apt would he be to flounder. The mediocre person in this case gains 
an advantage over the superior intellect. Other factors that count 
here are (a) suggestibility, (b) motor-co-ordination in manipulating 
the pencil, and (c) rapidity of decision. The intellectuel, though more 
intelligent than the ‘‘red-blood”’ would take a longer time to decide 
between two alternatives either of which something might be said in 
favor of, whereas he might have readily arrived at a sound conclu- 
sion, if the alternatives were not suggested to him. 

We should perhaps be willing to admit that ease of decision, non- 
suggestibility, control of inhibitions and even motor co-ordination all 
enter into the make-up of intelligence, but I for one should be chary 
about viewing them as representative factors. I ca conceive of 
a superior mind with slow reactions or of a deovalllinne yet is 
suggestible. The mere fact that one person can underscore a number 
of words a trifle sooner than another does not make the former more 
intelligent. 

There is a widespread tendency among intelligence testers to pro- 
duce the pragmatic criterion in meeting the contentions of critics, 
and one may anticipate in this connection an argument such as this, 
‘“What matters it whether our subjects belong to one type or another? 
It is the results that count, and since A has a score of 200 while B 
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managed to obtain only 198, we may safely conclude that A is the more 
intelligent of the two.” 

In reply to this form of reasoning, I should say that unless we 
define intelligence as a certain score which one receives in a particular set 
of tests, we are not warranted in assuming that a higher score is 
absolutely indicative of higher intelligence and a lower score of lower 
intelligence. It still remains to be discovered, for instance, just how 
much of a given degree of success is due to temperamental traits and 
volitional tendencies of one sort or another, not to mention, of course, 
the part played by experience. 

When we stop to examine the source from which the ‘multiple 
choice” tests derive their objectivity, we shall probably trace it to the 
arbitrariness of the deviser. Since he cannot exhaust the list of all the 
possible answers to a given problem, he must content himself with 
the supplementation of the few which occur to him as approximating 
in some way the correct solution. If instead of approximating the 
requisite answer, the few possibilities suggested are remote, such as 
hat: head: glove: boat: lamp: table: hand: key, the test must fall short 
of the purpose for which it was intended, unless it was designed to 
mark off the feeble-minded from the normal. 

On the other hand, if the other possibilities suggested are just about 
as correct as the requisite answer, e.g. in the case of the analogy square: 
triangle: circle: ellipse: cone: semi-circle: arc: oval, we have a subjective 
situation to deal with; for although the term cone seems to complete 
the most appropriate analogy, there is something to be said for each of 
the other forms as satisfying the requirement. The degree of sub- 
jectivity or objectivity, then, which attaches to any test of the “multiple 
choice” kind would depend on the parity or disparity of the possi- 
bilities supplied. But let us be mindful of the fact that in real life, 
alternatives that present themselves for action resemble the former 
category, hence the selective process called forth by the test is in no 
way representative of choice in actual life. 

A further objection against these cut-and-dried objective tests 
is the lack of provision for such qualities as initiative to figure in the 
examination. The man who can beat out a new track is given no 
more credit than the one who happens to choose the right path by 
noting that the others are only blind alleys. 

But the most serious difficulty of all that we have to contend with 
in our ‘‘multiple choice”’ tests is the comparatively small scope they 
afford us to tap the higher mental capacities which serve to distinguish 
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the superior intellect from the average mind. The testing of reasoning 
has been confined to the purely logico-mathematical problems; and, 
what is worse, has been conducted wholly along the ‘‘true-false”’ 
method. Just how much of the score in that particular type of test 
is due to guess-work and how much to ratiocination is a matter not 
easily ascertainable. 

Tests of abstraction, interpretation, tests to determine the degree 
of acumen or subtlety have been utterly neglected. The same applies 
to tests for critical ability, expression, and judgment tests, the sore 
need of which has occasioned the contrivance of substitutes which 
measure some phase of intelligence, but just what phase is not at all 
clear. Fancy the critic in actual life who is always provided by 
benevolent people with the cue of his criticism. Or what should we 
think of the interpreter of new movements and phenomena who 
must needs have at his elbow an inventory of possible interpretations 
out of which he selects the proper one? It is with such considerations 
before me that, in preparing my tests for superior adults, I was led 
to deviate from the highway of dubious efficiency and resort to the 
seemingly narrow by-way of precision, requiring an answer and not 
a line or a cross. The scoring is thereby rendered more difficult, 
but to compensate for the additional expenditure of time and effort 
we may repose far greater confidence in our results. If the old 
principle that we take out of an enterprise just about as much as we 
put into it holds true anywhere, it is in the field of mental testing; 
and any sacrifice of accuracy at the shrine of speed is deplorable on 
general grounds. 

The element of subjectivity in scoring tests in which the answers are 
not supplied will naturally be present to some extent, but experience 
has taught me that when the scoring directions are definite and com- 
plete and the examiners are asked to abide rigidly by the standards, 
the amount of personal bias entering into the scoring would be reduced 
to a negligible quantity. To be sure, it would require some training 
and more than a modicum of intelligence to score a series of tests for 
superior adults, but once the directions are followed and each test or 
portion of a test is scored by one person, uniformity is insured. 

In reply to critics who would urge that any test which does not 
make use of the device of objective scoring is bound to involve a 
personal factor, I should maintain that even if such fear be well- 
founded, there is no alternative under the circumstances unless we 
are prepared to delude ourselves into the belief that we are testing 
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superior intelligence when in reality what we are doing is tapping 
various degrees of mediocrity, and that too in a rough way. 

The multiple choice method is incontestably a good device in 
experimenting on animals, but when applied to men and women of 
high intelligence it affords us no adequate measure of the qualities 
we are bent on testing. After all, is it not the examinee’s intelligence 
that we are concerned with instead of the arduous labor of the exam- 
iner? Is it not more in keeping with scientific procedure to test a 
smaller number or fewer groups painstakingly rather than to accumu- 
late a vast body of data that are not wholly reliable because of the 
insidious infiltration of vitiating factors? One must realize in set- 
ting forth this argument, that under certain conditions “half a loaf 
is better than nothing” and we should accordingly be guided by our 
purpose and the circumstances in a given case. Thus during the 
great War, it was not the purpose of the intelligence examination 
committee to study individual differences or to single out men with 
exceptional mentality. Nor could anything but an objective set of 
tests be manipulated considering the huge size of the army to undergo 
the examination. 

In testing adults, however, for superior intelligence we have before 
us a different situation. In the first place, the number examined 
will necessarily be limited. The person responsible for the scoring 
would be expected to make himself thoroughly conversant with the units 
of measurement and the points of each problem, but should not be re- 
quired to overtax his faculty of judgment, that is to say, the nature 
of the test should be such as to demand concise and clear-cut answers. 

That the difficulty to be encountered in scoring “unguided” 
tests is exaggerated had become evident to me when I finished mark- 
ing some 300 acumen tests which at first seemed like one of the 
Herculean labors. It was probably the most subjective of my whole 
series,! and it looked as if the variety of answers and modes of tack- 
ling the questions would baffle me; but I soon learnt to discriminate 
between the correct and incorrect answers, with the result that, as 
far as I was concerned, the scoring appeared to be essentially the 
same as in the “true-false” tests. To take one or two instances: 
when the examinee is asked to show the significance of the adjectives 
in the following two clauses 

(a) He thought it would be a difficult but interesting task, 
(b) but it proved to be an interesting but difficult task. 


1 Roback: “Mentality Tests for Superior Adults.” 
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Only one type of answer is admissible, and that is that#a (a) the idea 
foremost in the mind was the interesting nature of the work, but in 
(b) the difficulty of the task outweighed the enjoyment derived from 
the interest. Other modes of expression might be equally suitable 
but the point of ‘‘interest foremost in (a) and difficulty foremost in 
(b)’’ would have to be stated. Similarly whatever else might be said 
about the connotative difference between the words ‘‘haste and 
speed,’”’ the correct answer should relate the former to the agent’s 
mental state and the latter to the objective result of covering a lot of 
space in a relatively short time. 


SUMMARY. 


Our objections to a set of tests which is made up exclusively of 
the cut-and-dried kind, either on the multiple choice plan or along 
the ‘‘true-false’”’ line, may be enumerated as follows: 

I. The objectivity of the tests does not attach to the general 
method of procedure, taking in the whole situation—purpose of the 
examination, mental functions of the examinee, etc.,—but is confined 
to the scoring only. In other words, the tests are devised with a 
view to the ease of scoring. Objectivity, such as this, is at bottom 
illusory, for its very raison-d’étre is the subjective desire of saving 
time and labor. 

II. The superior adult not only misses the opportunity for mani- 
festing his ability under such conditions, but his very originality and 
initiative in thought become a burden to him, when the courses are 
mapped out for him, with the result that the mediocre person has the 
advantage over his intellectual superior. 

III. Purely objective tests must necessarily be artificial, in no 
way representing a life situation. 

IV. Some, at least, of the higher functions cannot be approached 
by objective tests. Interpretation, analysis, subtlety, power of 
expression, judgment and other abilities are inaccessible to the 
‘“‘multiple choice” or “‘true-false”’ tester. 

V. The factor of guess-work in a given test of that kind is indeter- 
minate. In close scores, unless the disturbing element is eliminated, 
we have no means for proper comparison. 

VI. Objective tests afford us no avenue to the study of individual 
differences; and if differential psychology plays an important part 
anywhere in the different levels of intelligence, it is obviously in those 
levels above the average. 











AN EXPERIMENTAL AND STATISTICAL STUDY OF 


READING AND READING TESTS 
(Concluded) 


ARTHUR I. GATES 
Teachers College, Columbia University 


MoNROE’S STANDARDIZED SILENT READING TeEsT! 


This test is a revision of the Kansas Silent Reading Test, devised 
by F. J. Kelly in 1916.2 It consists of from 14 to 16 short paragraphs 
each followed by a question which is answered by writing or under- 
lining a word. The rate of reading is determined by giving a credit 
for each paragraph read regardless of the answer. The comprehension 
score is the sum of the values of those paragraphs the questions of 
which are correctly answered. Five minutes time is allowed. The 
test includes a fore-exercise of one paragraph. Test I is designed for 
grades III, IV, V; Test II for grades V1, VII, and VIII. 

A defect of some importance in Monroe’s test is the fact that the 
number of successful responses that can be secured by chance, 
is rather large. In the case of several paragraphs the chance of 
succeeding is one in two. The scale has been tested by marking the 
reaction without reading the paragraph. In the case of Test II, 
Form 1, a comprehension score of 12 was secured by checking the 
answers without reading the paragraphs. This score represents a 
grade of ability equal to the beginning of grade IV. 

Forms I and II were given at intervals of nearly three months. 
The following correlations obtain between the two performances: 





| Grade | Grade | Grade 

| IV | Vv | VI 
siaahdiaad ; , Se so caliaceleil 
NSE OLD See EE TE 0.72 0.73 | 0.80 
“eet — . oie 0.56 





These correlations give us little direct information concerning the 
tests except the suggestion that the rate scores are more stable than 
the comprehension scores, possibly for the reason that the factor of 
chance success in the latter is rather great. 





1 Monroe, W. S.: Monroe’s Standardized Silent Reading Test. Journal of 
Educational Psychology, 1918-19, 9, 303-312. 
2 Kelly, F. J.: Kansas Silent Reading Tests. Journal of Educational Psy- 
chology, 1916-17, 7, 63-80. 
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We have found the test to be too easy for the upper grades. 
Seventy per cent of our VIII grade pupils and over fifty per cent of 
the VII grade obtain a perfect rate score on Test 2 which is designed 
for grades VI, VII, and VIII. It was therefore impossible to use these 
results for purposes of correlation. 

In our opinion, the Monroe test would be a more useful instrument 
if it were constructed on the principle of the Thorndike-McCall, by 
combining Tests 1 and 2 with an extension into more difficult material 
than the most difficult Test 1. As the tests are now constructed, 
Test 2 duplicates (in terms of difficulty) fifty per cent of the material 
contained in Test 1 and includes only one paragraph of greater diffi- 
culty. There are many practical advantages for having a continuity 
of norms from grades III to VIII. With the Monroe test as it is, the 
brightest children in grade V cannot be tested by Test 1 and of those 
who can, no comparisons can be conveniently made with performances 
in the upper grades. 

Monroe has secured, it seems, a somewhat better method of control- 
ling comprehension while rate is being measured than either Brown or 
Courtis but his method is, in one respect, distinctly different from that 
used by Burgess. In the rate score Monroe gives credit even if the 
child fails to comprehend; Burgess gives no credit in such a case. 
Using the technique of partial correlations, Pressy! found that Mon- 
roe’s rate score, comprehension eliminated—or rather held con- 
stant—yields a slightly negative correlation with teachers’ estimates 
of reading ability. The validity of his teachers’ judgments as criteria 
is, of course, unknown. His correlations (0.27 for rate and 0.38 for 
comprehension) are very low when compared with those found in the 
present study. 

Table XII gives the correlations of the Monroe rate and compre- 
hension scores with all other measures. ‘‘Rate’’ yields the same 
correlations with both composites as “comprehension”? but both 
appear to agree more closely with the composite for rate. The corre- 
lations between Monroe’s rate and comprehension scores average 
0.92 +S. D. 0.03. Both scores correlate highly with Burgess and 
Directions but not so well with Thorndike-McCall and Courtis 
comprehension. From our studies of many cases known to be of 
extraordinary slowness but of good understanding? it was clear that the 
Monroe test measures either a quite different type of comprehension or 


1 School and Society, 1920, 11, p. 746. 
2 One such case is described in the discussion of the Thorndike-McCall. 
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else measures it in a quite different way from the Thorndike-McCall. 
The slow readers attain a low score on the former and a high score on 
the latter. If a measure of power of comprehension freed of the 
mechanical factors in reading is desired, the Thorndike-McCall is 
the test to use. 

The Monroe test yields only fair correlations with intelligence 
tests but higher with group tests than with the Stanford-Binet. The 
mean correlations with vocabulary tests range from 0.52 to 0.72. The 
correlations with Gray’s test of oral mechanics are fairly high (0.53 
and 0.62). The Brown comprehension score shows a zero correlation 
as it does with most other tests. On the whole, no evidence appears in 
support of Pressy’s finding that the Monroe comprehension score is 
more valid than the rate score; in fact, the rate score yields slightly 
better correlations with both speed and comprehension composites. 


CourtTis SILENT READING Test No. 2 


The Courtis test measures speed and comprehension separately. 
Part I consists of a childish narrative of 567 words to be read for 
three minutes, the position being checked each half minute by encir- 
cling a word. In Part II the same material is broken into 14 para- 
graphs each followed by 5 questions which are to be answered in a 
word. The number of questions answered correctly in 5 minutes is 
the score for comprehension. The subject may reread a paragraph as 
often as desired. The author uses an “Index of Comprehension” 
which is the percentage of correct answers. We have found this 
percentage to be so frequently 100 in grades above the third that it 
has not been used in this study. 

The Courtis rate test provides no check upon the degree of com- 
prehension with which the subject reads. There is nothing to prevent 
radical changes in the speed on different tests save personal checks 
adopted by the subject. Following are correlations of a test with a 
re-test given three months later. 


oS Pooks athe as es Chae ae “S's 0.85 
acd £5 dee awe 4 68s eke Cen 0.87 
hein kd dite kde sesh eek 0.70 
Be) » cs Ditetiebiene sends eeiekaee 0.48 
ee ee ade eA 0.57 
al tah eke onan ah se AROS 0.52 
DT TOL k Bian de a hae eh 0b ee ee cee ke 0.666 
CEG he eee w ew ceukeeedhcsduwnewees 0.16 
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While it is impossible to determine what allowances to make for the 
long intervals, it appears that the consistency of performance on this 
test is not as high as is desirable, especially in the upper grades, for 
which the material is rather trivial and uninteresting. Generally, a 
positive correlation between initial ability and improbability in a 
function is found and if such were the case here, the interval would 
make the correlation higher than it would have been after a day. 

For Part II—the comprehension test—the correlations of the 
two trials were: 


ta ee ar Wa aaa 0.80 
DE tr. a dees coh ea 0.78 

OF Mle So cs bs os a BE 0.65 

RE ase hs dae eyo hae eed 0.80 

ee eS. occ cy dnb Qael's cbemed 0.80 

VIII.. 0.76 
Se ee as PE a. fal a aie ee 0.765 
te Noa Soi ik Taek Wolter 0.05 


The constancy of performance is higher and less variable in this 
part of the test which controls comprehension. Our data cannot be 
considered as trustworthy evidence on the reliability of the tests 
since it is impossible to take into account the effect of the interval. 

Table XIII shows the correlations of the mean scores of two 
Courtis tests with other single tests and the composites. The rate 
score yields a mean composite of 0.76 + 8S. D. 18 with the composite 
of speed and 0.58 + S.D. 19 with the corrected composite of 
comprehension. The comprehension score yields a higher correlation 
with the comprehension composite (0.70 + S. D. 14) and a lower 
correlation with speed 62 + S. D. 0.20. The difference however is 
not reliable and if we make allowance for the inclusion of rate score 
in rate composite and the same for comprehension, it appears that 
either part of the test is about as good as the other and that neither 
tests one aspect of reading any better than the other, in spite of the 
fact that the correlation of Courtis speed and comprehension averages 
but 0.44. The range of the speed-comprehension correlations for 
Courtis is from —0.27 to + 0.82, a fact to be explained partly, in our 
opinion, by the possibility of shifting the type of reading—now reading 
carefully, now rapidly—in the speed test. The comprehension score 
yields a higher correlation than the rate score with both the Brown 
and the Monroe rates, as well as with the Thorndike. The correla- 
tions with Burgess and Monroe comprehension are about the same. 
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Evidence secured from our poor readers indicates that both forms 
of this test really measure speed rather than power of comprehension, 
after the manner of the Thorndike test. Our Grade VIII poor reader 
is distinctly the slowest performer in the class in this test but above 
the average in Thorndike-McCall. The second form of the test 
seems to be in general more reliable since it is less easy to ‘‘fake”’ it. 

The correlations with Stanford-Binet mental age average zero, 
grade IV being negative and grade VI positive. The correlations with 
the composite of group intelligence tests are rather low, averaging 
0.35 + 8. D. 0.26 for Part I and 0.57 + S. D. 0.13 for Part II. Cor- 
relations with the vocabulary tests are low. In general, the §.D.s for 
the Courtis tests are very high, indicating either great variability of 
performance or individual differences in type of reading, some of them 
of the sort not indicative of real ability as shown by other tests. The 
correlations with other criteria are not as high as those yielded by 
Burgess and Monroe. 


WooDWORTH-WELLS DIRECTIONS TEST 


The Directions used were the three forms devised by Woodworth 
and Wells! combined into one test. Those authors had in mind testing 
ability to understand instructions. To quote their words: ‘‘The 
conditions which it was sought to meet in the test material are: (1) that 
the motor response should be very simple and quickly performed; (2) 
that the instructions should be very simple, but varied; and (3) that 
the instructions should be as concise as possible in order that reading 
time might not be the determining factor.”’ Samples of the easy 
directions are: ‘‘Cross out the g in tiger.”” “Put a dot in the circle 
below the center 0.” There are 40 directions of this or slightly 
greater difficulty? and 20 are considerably more difficult. For 
example: “Write yes, no matter whether China is in Africa or 
not. . . ; and then give a wrong answer to this question: ‘‘ How 
many days are there in the week? . . .” 

The results were scored by the familar method—right minus the 
sum of errors and omissions. Three and a half minutes were allowed 


1 Woodworth, R. S. and Wells, F. L.: Association Tests. Psychol. Monograph 
No. 57, Dece., 1911. . 

2 Pintner, R. and Toops, H., have empirically determined the difficulty of these 
directions and published a revision. See Journal of Educational Psychol., 1918, 9, 
123-142. We used the’old forms because the material was needed in a hurry and 
electrotypes were at hand. 
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but many of the lower grade subjects did not get as far as the Hard 
Directions. Woodworth found, however, a correlation of 0.92 between 
the two forms, so they appear to yield approximately the same results. 
Table XIV gives the correlations. 

The Directions test seems to be more like the Burgess than any 
other here used. This similarity is evidenced by the fact that it 
yields a higher mean correlation with the Burgess than with any other. 
In certain respects the Directions test is superior—in reducing the 
amount and difficulty of the motor response to a minimum. In some 
instances the Directions test put a demand on information (e.g. “If 
Edison discovered America,” etc.) and in other cases the directions 
are something of a verbal puzzle. A scale which combined the merits 
of the two tests would probably be superior to any test for rate of 
reading now available. 

With the exception of grade IV the correlations are fairly high 
and the test agrees, like the Burgess, about as well with the composite 
of comprehension as with speed. On the whole, it appears to measure 
reading ability about as well as most of the newer instruments. The 
correlation with Thorndike-McCall is low but, like the latter, it shows 
an increased correlation with Stanford-Binet as we pass from the 
lower grades up. With the exception of grade VII, correlations with 
the group intelligence tests are 0.6 or above and correlations with the 
vocabulary tests are around 0.5. 


GRAY’s OrAL READING TEsT! 


This test consists of eleven paragraphs arranged in order of increas- 
ing difficulty. It requires some skill to use the tests in accordance with 
Gray’s directions. The time for reading each paragraph is taken 
with a stop watch and the number of errors noted. A table of credits 
representing a composite of speed and errors is provided and the result 
is multiplied by a figure to make allowances for grade differences, 7.e., 
the higher the grade the lower the credit allowed. All of this is time- 
consuming and the result obtained is a score which allows comparisons 
only with norms and scores in that particular grade. This method 
may be pedagogically sound but in practical use it is very cumbersome, 
and teachers become chagrined at the time involved. In this study 
we have taken simply the sum of the credits for the paragraphs from 


1Gray, W. S.: Studies of Elementary School Reading Through Standardized 
Tests. Univ. Chicago Sup. Educ. Monograph, Vol. 1, No. 1. 
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Gray’s table of composite speed and error scores, making all scores 
comparable. 

A system of shorthand practices is provided by Gray who suggests 
checking the child’s reading for six types of errors; gross errors, minor 
errors, omissions, substitutions, insertions and repetitions. While it 
requires some practice and skill to do this, the detailed information is 
frequently most valuable. The subjects must be treated individually, 
of course, with this test which requires from five to ten minutes. 

We have been unable to give re-tests with this instrument. Unfor- 
tunately but one form is available. No objective measure of compre- 
hension is provided, since the test deals frankly with the mechanics of 
oral reading. 

Table XV gives the results. The test yields a correlation of 0.59 
with the criterion of comprehension and 0.57 with rate. The correla- 
tions with individual and group measures of intelligence are lower than 
those obtained by the best tests of silent reading. Correlations appear 
of approximately 0.5 with a single test of silent reading or of 
vocabulary. It yields a correlation of about 0.5 with a pronunciation 
test consisting of 36 words, containing from two to thirteen letters, 
used by the writer for diagnostic work. The criteria presented in this 
paper do not enable us to test the real value of this test. 

We have found it to be an exceedingly useful instrument, especially 
for purposes of individual diagnosis partly for the reason that the one 
who experiments can observe intimately the particular reactions. 
Its use for such purposes will be described in another paper. There is 
a real need for several new forms of this test. 


THE VOCABULARY TESTS 


Holley’s Sentence Vocabulary Scale was devised for the purpose of 
measuring intelligence.! It consists of seventy words from the Stan- 
ford-Binet series printed in short sentences which are to be completed 
by underlining one of four words which follow each. For example: 
“Some puddles are made of...mud...sand...stone...brick.’”’ No 
time limit is fixed. 

Before the writer became aware of the existence of the Holley 
Scale, a similar test had been devised consisting of fifty words from 


1 Holley, C. E.: Mental Tests for School Use. Bureau of Educ. Research, 
University of Illinois, Bull. No. 4, pp. 86-91. 
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the Terman list, each followed by five words, one of which was to 
be underlined to illustrate the meaning. For example: 

1. Gown—(dress, tree, bird, ram, fish.) 

44. perfunctory—(skillfully, odorous, speedily, carelessly, pretty.) 
There was no time limit. 

The Thorndike Visual Vocabulary was used in all grades, but an 
error in administration forced the results to be discarded except for 
grades V and VIII. Im these grades the following correlations 
appeared: 





Thorndike visual vocabulary with 





Men- | Group | Bur- Liovgiaal Co ™ Special | Comp. | Comp. 
. dike- 
tal age | intell. | gess McCall ley | vocab.| comp. | speed 








Grade V......... 0.30 | 0.53 | 0.56) 0.50 0.48 0.53 | 0.56 | 0.48 
Grade VIII...... .... | 0.47 | 0.49] 0.47 | 0.52) 0.56 3 























Tables XVI and XVII give the results for the other two tests. 
They show almost identical correlations throughout. The correla- 
tions with rate and with comprehension in reading range from 0.4 to 
0.6. The correlation with group tests of intelligence is 0.5 and about 
0.3 with the Stanford-Binet mental age. 

In the mass, knowledge of word meanings is positively associated 
with reading ability but the correlation is not sufficiently high to make 
a vocabulary test an adequate measure of it. The correlation of 
vocabulary and the composite of comprehension is 0.6 as compared 
to 0.8 for the Burgess test. In special cases we find wide variations 
between performance in reading and vocabulary tests. Likewise the 
vocabulary tests here used are positively—but not very closely — 
related to intelligence. Our knowledge of the functions involved in 
acquisition of word meanings is very meagre and hazy. We do not 
know to what degree various types of training—specific training with 
words, wide reading, etc.—will increase vocabulary. Certain studies 
suggest that general intelligence may set a rather strict limit upon 
such development. There is need for a review of the scattered data 
concerning vocabularies and a great need for extensive experimental 
and statistical study. 
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The completion of a most extensive inventory of words used in 
English reading by Thorndike! offers new possibilities for scale con- 
struction and research in this field. More than 4,500,000 words 
from a selected list of sources were tabulated and of these, the 10,000 
occurring most frequently have been printed in alphabetical form with 
indexes indicating the relative frequency of occurrence.? These 
words should form the content of tests for purposes of standardizing 
age and grade achievements, for diagnostic and experimental work, 
as well as for many other educational uses. 


CORRELATIONS WITH AGE, INTELLIGENCE AND PERFORMANCE 
IN OTHER SCHOOL FUNCTIONS 


The criteria were as follows: 

1. Chronological Age, 

2. Stanford-Binet Mental Age, 

3. A composite of 6 to 8 group tests of intelligence, 

4. A composite of three spelling tests including 182 words selected 
from several columns of.the Ayres-Buckingham list, 

5. A composite of the-Woody Arithmetic tests (all four operations), 
Monroe’s Diagnostic (12 to 24 functions), and Monroe’s Reasoning 
Test, . 

6. A composite of speed and quality in writing, 

7. A composite of school achievement including reading, spelling, 
and arithmetic, 

8. Judgments by 4 to 9 teachers of “school attitude”’ consisting of 
what each teacher thought important such as application, diligence, 


- persistence, interest, willingness, etc. 


Table XVIII gives the correlations with comprehension and Table 
XIX with rate. 

Both rate and comprehension are negatively correlated with 
chronological age and positively related to mental age. This is the 
usual finding: the correlations with mental age are low for grade III 
and increase to 0.6 or 0.7 in grade VI. Since the inter-correlations 
among reading tests were as high in the lower grades as in the higher, 
the increasing correlation with mental age may be interpreted to mean 





1 Thorndike, E. L.: Word Knowledge in the Elementary School. Teachers 
College Record, Sept., 1921, pp. 334-370. 

2 Thorndike, E. L.: ““The Teacher’s Word Book.’”’ New York: Teachers College 
Bureau of Publications, 1921. 
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that intelligence, as measured by Stanford-Binet shows itself only when 
the mechanics of reading are fairly well mastered. Other interpreta- 
tions are possible and none can be wholly justified by our data. 


TaBLE XVIII.—CorRELATIONS OF READING COMPREHENSION WITH— 


















































Chron.| Men- Comp. | Comp. | Writ- penee 

age | tal age grep) Spee. | Arith. achieve.| ing - 

intell. : tude 

Grade 3....... -0.16 0.10 | 0.65 | 0.63 | 0.17 0.77 0.53} 0.46 
Debsaued 0.05 0.16 | 0.77 | 0.54 | 0.28 0.72 0.41; 0.22 
rere | 0.16, 0.41 | 0.68 | 0.04 0.11 0.81 |—0.36) 0.29 
Ry | 0.63) 0.69 | 0.88 | 0.36 | 0.26 | 0.71 |—0.06) 0.43 
peer ee: —0.39 . | 0.58 | 0.25 | 0.19 0.70 0.07; 0.32 

Rs ve —0.33 0.59 | 0.57 | 0.41 0.72 0.00) —0.22 
Mean......... 0.22, 0.34 | 0.69| 0.40 | 0.24 | 0.74 0.10, 0.25 
i SRE: 0.27 0.24 | 0.10} 0.21 | 0.10 | 0.04 | 0.29, 0.23 

















TABLE XIX.—CoRRELATIONS OF READING RATE WITH— 
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Chron.; Men- | Group | : | Comp. | Writ- | School 
: Spell. | Arith. ; ; atti- 
age | tal age | intell. achieve.| ing | todo 

| | 
Grade 3....... —0.30/ 0.19 | 0.43 | 0.52 | 0.21 | 0.79 | 0.31) 0.55 
“Fe '—0.19} 0.13 | 0.44! 0.22 | 0.21 | 0.36 | 0.54)/—0.32 
Sith —0.23} 0.56 | 0.39} 0.32 | 0.22 | 0.82 |—0.46 —0.13 
asks —0.49| 0.60 | 0.76 | 0.38 | 0.36 | 0.74 | 0.00 0.45 
RE: —0.50 0.69 | 0.42 | 0.29 | 0.84 | 0.16 0.54 
esd —0.39 0.73 | 0.42 | 0.15 | 0.69 | 0.19,-0.40 
Mean......... —0.35| 0.31 | 0.57] 0.38 | 0.24 | 0.71 0.12 0.12 
bi Ba icasens 0.12} 0.30 | 0.15 | 0.09 | 0.07 | 0.16 0.30) 0.40 




















The correlations with the composite of group intelligence tests is 
higher than with the Stanford-Binet and these are about as high in the 
lower as in the higher grades. Both of these facts might be explained 
by the greater demands of the group tests on reading, which are 
rather uniformly stable in the various grades, but this explanation is 
in no way defensible by our data. 

The correlations with school attitudes are ambiguous. The signifi- 
cance of the teachers’ judgments as indicated by the correlations, 
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are not uniform. Whether different judges are estimating different 
traits—or the same traits with varied success, or whether these traits 
are quite differently distributed with reference to reading ability in the 
different grades are possibilities which the data do not disclose. 

The correlations of reading with the composite of school perform- 
ance are high (0.70) and rather uniformly so among the grades. The 
correlations with single composite subjects are not high; the correlation 
of 0.4 with spelling being the highest, following by arithmetic which is 
0.24. Writing shows, in the mean, a correlation of approximately 
zero but it is worth noting that the correlation is irregular with some 
indication of being positive in the lowest grades. 

The measures of comprehension and rate correlate about equally 
with the several criteria since they are themselves highly correlated. 

It must be noted again that the correlations of Tables X VIII and 
XIX are valid only for comparison among themselves. They do not 
represent ideal relations of these functions among unselected groups. 
They do, however, clearly indicate a rather low degree of inter-depen- 
: dence of school functions which has an important bearing upon the 
4 validity, for example, of practices based upon the concept of the 
i “accomplishment quotient’’ and other devises which assume but slight 
specialization of abilities. 





4 GENERAL SUMMARY AND CONCLUSIONS 


I. Concerning Reading Ability in General. 
1. The concept of “silent reading ability” is justified both for 
theoretical and practical purposes by our data. 

(a) A single comprehension test given in 34% to 30 minutes 
yields a correlation of 0.7 to 0.8 with a composite of 
comprehension tests representing from 4 to 8 hours of 
reading. 

(b) A single rate test given in 1 to 5 minutes yields a correlation 
of 0.7 to 0.8 with a composite of rate tests. 

(c) If corrections were made for attenuation and for the 
restriction of range in our data, the correlations would 
certainly be higher than they are. 

2. Rate and comprehension are very highly correlated; the compos- 
ites for the two yield an uncorrected correlation of 0.84 + S. D. 
0.08.1 





1 The S. D. is the S. D. of the correlations for the separate grades from the mean 
4 of the grade correlations. 
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3. Most of the tests do not differentiate rate from comprehension 
for the correlations of rate tests with the composite of compre- 
hension are about the same as with the composite of rate, and 
the correlations of comprehension tests are about the same 
with rate as with comprehension. 

4. The existence was discovered in dealing with special cases of 
backwardness in the mechanics of reading of a useful distinc- ; 
tion between rate and comprehension, of importance for b 
diagnostic and remedial work. 

5. The correlation of a single test with the composite averages 
0.7 or higher, while the mean inter-correlation of single tests 
is about 0.5. While the former is higher partly because of 

less attenuation due to the fact that the composite is a more 

nearly perfect score, the difference is so great as to indicate 
that the several tests measure somewhat different combinations 
of the many functions involved in reading. 

6. The correlation of silent reading with oral reading as represented 
by Gray’s Oral Reading Test is nearly 0.6, which in our data 
is a fairly high correlation. 

7. Correlations with two vocabulary tests (Holley’s and one devised 
by the writer) average about 0.6. 

8. Correlations with a composite of school subjects (reading, 
spelling, and arithmetic) are as high as 0.7 largely because 
reading is itself included in the composite. 

9. Correlations of comprehension with spelling average 0.40, with 
arithmetic 0.24, with writing 0.10. The correlations of these 
subjects with rate are about the same. 

10. Correlations with chronological age are negative; —0.22 for 
comprehension; — 0.35 for rate. , 

11. Correlations with Stanford-Binet Mental Age are not high; ; 
0.34 for comprehension and 0.31 for rate. . 

(a) The correlations of mental age with comprehension in grade 
III is 0.10 which rises steadily to 0.69 in grade VI. The 
facts are similar for rate. This is not to be explained 
by the lack of validity or reliability of the reading 
tests, or variability in reading performance because the 
correlations among reading tests are as high in the lower 
as in the higher grades. It is not likely that they are due 
to lack of validity of the Stanford-Binet tests. It is 
probable that intelligence of the type measured by the 
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Stanford-Binet does not show itself in reading until 
the mechanics of reading are fairly well mastered. There 
are, however, other possible explanations. 

12. The composite of group intelligence tests yields higher correla- 
tions with reading than does the Stanford-Binet Mental Age; 
the mean with comprehension is 0.69 and with rate 0.64. 

(a) The correlations in this case do not increase regularly as we 
advance from grade to grade. 

13. Correlations with teachers’ estimates of ‘‘school attitudes” 
such as interest, application, etc. are ambiguous, being high 
in some grades and low in others, for reasons which we have 
not discovered. 


II. Conclusions Concerning Particular Types of Reading Tests. 
A. Tests for Rate of Reading. 

1. The mean of the grade correlations for rate tests with the 
criteria for rate was: Brown 0.82 + 8S. D. 0.18; Monroe 
0.87 + 0.03 and Woodworth-Well’s Directions0.79 + S. D. 
0.14; Courtis 0.76 + 0.18. 

2. Assuming the validity of the criterion, one of these tests 
appears to be as good as any other. 

3. The criterion is, of course, not perfect and more detailed 
study of the reliability as well as the validity of the tests 
indicated. 

(a) That a rate test which provides no control of compre- 
hension (Courtis) gives less reliable results than one 
which provides a partial control by requiring a repro- 
duction following the reading (Brown). 

(b) That a test which mechanically controls comprehension 
by completing a picture, answering questions, carry- 
ing out directions, etc. is best (Burgess, Directions or 
Monroe). 

4. The correlations of the rate tests with the criteria of com- 
prehension are high and support the statements in the 
above I (3). They are: Courtis— 0.58 + S. D. 0.19; 
Brown 0.66 + 8. D. 0.10; Burgess 0.80 + S. D. 0.09; 
Directions 0.78 + 0.14; and Monroe 0.76 + S. D. 0.10. 

5. Usually the so-called “‘rate’”’ tests measure comprehension as 
well as rate, and the so-called ‘‘comprehension” tests 
measure rate as well as comprehension. It is well to use 
both parts of the combined tests (Courtis, Monroe) 
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since the results are more reliable but they do not measure 
perceptibly different functions of reading. 
B. Tests of Comprehension. 

1. The mean of the grade correlations with the criteria of 
comprehension are: Burgess 0.80 + S. D. 0.09; Brown 
comprehension 0.16 + 8. D. 0.16; Courtis comprehension 
0.70 + S. D. 0.14; Directions 0.78 + 0.14; Monroe com- 
prehension 0.77 + S. D. 0.08; and Thorndike-McCall 
0.73 +58. D. 0.12. 

2. Assuming the validity of our criteria of comprehension, it 
appears that a record of what is written in a free repro- 
duction following the reading of a passage (Brown test) 
is not a valid measure of comprehension. 

3. Comprehension may be adequately measured by answering 
questions about short paragraphs which are of uniform 
difficulty (Courtis) or of increasing difficulty (Thorndike 
and Thorndike-McCall); by completing a picture in ac- 
cordance with directions contained in short paragraphs 
of equal difficulty (Burgess) or by checking, crossing 
out, underlining or writing words to indicate understand- 
ing (Monroe and Woodworth-Wells Directions). 

4. From the mean correlations with the criteria it appears 
that one test is about as good and measures about the 
same thing as another but by applying these tests to 
certain cases of backwardness in the mechanics of reading 
it was found. 

(a) That the Thorndike and Thorndike-McCall measures 
power of comprehensions freed of the mechanical 
factors (speed) of rgading. 

(b) That success in the other tests is determined by speed as 
well as by comprehension. 

(c) These facts are indicated by the data (5) below. 

5. The correlations of comprehension tests with the criteria 
or rate are: Brown 0.08 + S. D. 0.25; Burgess 0.82 + 
S. D. 0.15; Courtis 0.62 + S. D. 0.20; Monroe 0.83 
+ §. D. 0.04; Directions 0.79 + S. D. 0.14 and Thorndike 
0.52 + S. D. 0.14. 

C. General Conclusions Concerning the Tests. 

1. Detailed study of the individual tests betrays defects chiefly 

of the following types: 
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(a) Too much time is spent in writing or drawing answers, 
solving linguistic puzzles and in other irrelevant 
functions. 

(b) Certain variables, such as time spent in drawing, are 
not taken into account in standardizing the test units 
with the result that units are of unequal difficulty. 

(c) The play of chance in carrying out directions is some- 
times too great. 

(d) The units are often too coarse or too few. 

(e) Material is sometimes too easy or too difficult, trivial 
or uninteresting. 

(f) Methods of scoring are unsatisfactory. 

: (g) Different forms of the test are of unequal difficulty. 

i 2. Reading is a function which can be profitably measured and 

i in which rate and comprehension can be differentiated, 

although most of our tests do not do so. The present 
tests are useful but not perfect instruments. We need 

q tests constructed with such care that the numerous 

i defects found in the tests now in existence shall be avoided. 


APPENDIX 





Table VIII, which gives the correlations for the Thorndike-McCall 
test was unintentionally omitted from the October issue. It appears 
in this issue. 








SMALLER VS. LARGER UNITS IN LEARNING 
TO TYPEWRITE 


J. W. BARTON 
Professor of Psychology, University of Idaho 


When one applies the general psychological principle that economy 
in human energy requires that for learning, ‘‘one should always begin 
by doing a thing as nearly as possible in the way it is eventually to be 
done,’’! to typewriting, it is not surprising to find many instances of 
responses that do not conform to this general statement of requirements 
in adequate learning. There are many instances, in the lesson guides 
now in use, of exercises that are performed in ways about as far from 
the way in which they are eventually to be done as one could make 
them. It is likely that in many other fields of learning—penmanship, 
spelling, music, prerequisites in college and industrial work—there is to 
be found this same tendency to abandon this scientific principle of 
learning with great waste to the learner in time and energy required. 

‘“‘Faculty psychology”’ seems to have made such decided inroads 
on our practices that it is hard to weed out in one or two generations 
the apparent transfer-of-training idea in the work of our schools. In 
some instances of learning it is very difficult to recognize that we are 
dealing with just this matter. Learning by means of too small parts is 
only another form of this ‘‘ general-culture”’ notion; particularly is this 
the case if the parts are relatively much smaller than is required for 
comprehensive units, for if we should look upon learning as a means of 
fixing neural pathways of stimulus-response processes, then it is not 
difficult to understand why the “alphabet method” in reading was 
forced to give way to the larger unit methods now in use. 

What has been found true for ineffective learning, in the case of 
reading is here found to be true for the alphabet-method in typewriting. 
The two situations seem very much alike and the results indicated 
below are very suggestive of what might be accomplished in the pre- 
vention of waste in the acquiring of this skill. What has been assumed 
to be true in learning—that however a bodily? process is exercised it 
will be equally efficient in any subsequent demands made upon it—is 
being shown, in many instances, to be fallacious and is thought to 


! Hollingworth and Poffenberger: ““Applied Psychology,’ 1917, pp. 66-67. 

2 It would be more nearly correct to use the term mental in this case except 
for the fact that it is now being recognized that the physical serves as a better 
means of approach in matters of control. 


465 








ie 
i” 
, 
j 
f 


Cf Maks see oo 


466 The Journal of Educational Psychology 


involve avoidable waste. This is particularly true in situations involy- 
ing units too small to get all the benefits to be had from the readiness in 
response conditions and those of identity in neural factors involved on 
subsequent response occasions. 

The Problem.—-As far as could be determined at the time, typewrit- 
ing is now being taught by the small unit, or alphabet, method in most 
institutions including this work. They all have in mind the two 
objects of accomplishment—mastery of key board and facility of 
operation—but the idea seems to be that mastery and facility in any 
situation is to effectively serve in others. Mr. J. W. Ross! points out 
that the letter combinations learned in one word are no particular help 
for a different combination of the same letters in another word. This 
conclusion is in keeping with the modern psychological view of partial 
identity in neural impulse responses. In criticising the word method 
Mr. Ross says,? ‘“‘This means that in the word method the transition 
from writing the combinations of lettersin one word to the combinations 
in another word requires a momentary pause in the thought process 
necessitated by the effort to break up a manual habit that was not 
directed by a conscious mental effort. This mental pause is eliminated 
in the line method which establishes an uninterrupted flow of mental 
direction coordinated with a corresponding smoothness in manual 
operation.” 

It is likely a fact® that there is a slowing up in such a situation due 
to lack of organization in the two different nerve processes involved in 
the two different word situations, or it could be explained on the basis 
of lack of ‘‘readiness’’* in nerve preparation for action. If such is the 
case in the situations indicated by Ross, what is to be said concerning 
such letter combinations as are found in his first exercise? These are, 
“asdf jkl; ;lkj fdsa jkl; ;lkj fdsa asdf jkl; ;lkj; ;fdsa.”’ 

These are given, of course, for purposes of mastering the key-board; 
but if there is one bit of justification for the general principle of learning 
indicated above and for what Ross says for the units of his line method, 
it should follow that such exercises as those presented above are very 
inadequate as a means of teaching typewriting, since many of them 
never occur in English composition. Particularly has this conclusion 
some justification if it can be shown that the key-board can be mastered 





1 Ross: ‘Lessons in Touch Typewriting,” 1914, Preface. 

2 [bid. 

3 No scientific evidence available. 

4 E. L. Thorndike: “Educational Psychology Briefer Course,” p. 53. 
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by means of exercises involving letter combinations had in the actual 
composition material most in keeping with the work of the typist. 

What is held to be true of letter combination situations is also true 
for words or for sentences in so far as they fall short of the general 
principle indicated above, for learning in any situation of isolation has 
in it this lack of neural organization so necessary to subsequent 
responses if they are to be adequate. It is likely that Peterson! is 
very near the actual situation in learning when he defines it, partially 
at least, in terms of completeness of response. It might be well to 
point out that another factor here involved in learning, is that of 
‘fitness in neural organization.”” It seems that for every new situation 
presented there is a more or less readjustment throughout. This 
readjustment likely consists, for the most part, of nerve units involved 
in the matter of making ready for the adequate response. To meet 
these demands of learning is to provide situations requiring a minimum 
of reorganization to the point of immediate adjustment, as well as to 
provide a maximum of exercise in the complete context of activity as 
it is to be used when learned. 

It was this very apparent lack of fitness in the situations of 
response, involved in these isolated exercises so widely used in the 
teaching of typewriting, that prompted this study. It was felt that 
what had been true concerning the inadequacy of the alphabet method 
in teaching reading would likely prove true for the “‘isolated-small-unit- 
system” used in the teaching of typewriting, and that a better way 
might be had by conforming more closely to the psychologically 
justifiable larger unit exercise material plan. 

The Method——Two groups of students were selected, the first 
(fifteen subjects) by the process of the regular registration in high 
school and a second group (twelve) by special registration at a later 
date. 

The first group began work the first Monday in September, 1918 
under the direction of a teacher whose scholastic qualifications con- 
sisted of a high school training, two full years of college work, and a 
graduate of a standard business college. She was without previous 
teaching experience. 

Nothing was said to the teacher at the time of registration that 
another class was to be started later, for which reason the students 
were directed according to what such teacher, just out of such training, 


1 Psychological Review, Vol. XXIII, No. 2, Mar., 1906. 
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thought to be the most approved methods, using the Remington 
Lessons as a guide. 

During the eleventh week after the opening of school a part of 
the second group (eight subjects) was started. Later four addi- 
tional ones were taken in. These students (known hereafter as 
group W and the regularly registered group shall be referred to as 
group P) were selected by advertising that another class in type- 
writing was to be started for a limited number of additional students. 
While no mental tests were used by which the relative mentality of 
the individuals was determined, the average class standings in all 
subjects for the year 1918-1919 indicates for the two groups a high 
uniformity in success at school work. The marks are as follows: 


/ 

















Group P | Group W 
| 
| | | 
Subjects | Grades | Subjects | Grades 
| | 
1 B | 1 | Cc 
2 | B | 2 | C 
3 | B | 3 | C 
4 | C | 4 B 
5 | A | 5 B 
6 | C | 5 B 
7 | B | 7 B 
8 C | 8 | A 
9 | C | 9 | C 
10 | B 10 | D 
11 B 11 | C 
12 C 12 C 
13 | D 
14 C 
15 | B 








The two groups were directed by the same teacher, for the most 
part, and were under the same instructions in typewriting in every 
way except in the matter of the kind of exercise material worked 
upon and in a few other matters of handing in the product of their 
work at the office and reporting for a few other minor instructions. 

Group P used the Remington Guide as the source of exercise 
material, taking the lessons in consecutive order as they are given 
there. The subjects of Group W were first asked to make a chart 
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of the keyboard on common letter size news print paper to be used 
in learning the keyboard. This chart, as well as the large one on 
the wall, was marked off by means of very distinct lines to indicate 
the respective fingers for the various sets of keys; then they were 
briefly instructed concerning the function of the various parts of the 
machine. At the end of these instructions the attention of the 
students was directed to the instructions written on the board. They 
read as follows: ‘“‘ Write a letter to Sears and Roebuck ordering a 
pair of shoes which should cost $7.00 and to be sent by parcel post.” 

The next exercise included a letter to a friend inviting her to an 
evening party. In some cases, where it was found necessary, they 
were asked to continue to rewrite the letters indicated above. It 
should be understood that in all such cases the student was free to 
formulate anew these letters and that they were never required, at 
this point in the learning, to copy the letters already written. 

This kind of self composed exercise material was used until each 
student felt sure that she could properly produce the desired characters with- 
out the aid of the key-board chart. At this point in the learning each 
student was required to do copy work.' The copy work consisted, 
for the most part, of reading matter found in the Red Cross materials 
of the time, although many of the students used their history texts 
or other school books as copy material, believing that by so doing 
they could get some help in the preparation of these lessons while 
doing the typewriting. 

None of the students of either group had access to a machine 
outside of the forty minute period per day for five days per week 
throughout the year, except for the case of No. 11 in Group W who 
began much later than the other students of the group. It should 
be remembered that some of the students carried their keyboard 
charts home with them at first and used them in learning the work 
of the respective fingers in letter symbol production.’ 

In no case was it ever found necessary to check a student for 
attempting to drill on any isolated material for correcting letter 
errors. In all cases involving such errors, the students were required 





1 The writer appreciates that the material used in the composition or in the 
copy material does not conform to what has been found to be the most commonly 
used letter combinations in business or professional composition. A more scientific 
selection of such material should be had. 

2 One student (No. 8) memorized the keyboard during the first practice period, 
and during the evening at home by means of the chart. 
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to make the correction by reproducing enough of the context to include 
a unit of thought factor. 

Data and Discussion,—It will be seen at once that for Group P 
it was impossible to get any measure of the product in terms of words 
per minute before January third. About all that can be done at 
present by way of standardizing such work is to require the exercises, 
as¥outlined in the guide in use, and demand that these be reproduced 
up to a certain degree of perfection in duplication. This was done, 
but it was of little service in getting a measure of the learning. 

This class of work was participated in by Group P until about the 
middle of December (the students not being uniform in the amount 
of work accomplished an exact date in this matter is impossible) 
when they had reached that part of the guide that presents compo- 
sition material. From this point on to the end of the course, the two 
groups were put together on the material of the guide. 

After two weeks of practice on the composition material by 


Group P (Jan. 3, 1919) the following relative conditions obtained 
for the two groups: 





Group P | Group W 











Average number of practice periods...................... 75.9 20.75 
Average number words attempted per minute............. 53.2 66.3 

Average number of errors per minute.................... 11.4 7.25 
Average number words per practice period................ 0.70 | 3.19 








On the basis of the time spent in practice, Group W produced 
4.56 times as good results as was the case for Group P. The students 
of Group W, up to this time, spent only 0.273 as much time as those 
of Group P had spent and had attained 4.56 as great speed in pro- 
duction for the time spent as well as reducing the number of errors 
much below what was found for Group P.! 

From the third week in January up to the time of closing school 
in the spring (May 28) measures were taken, at intervals of from 
one to two weeks apart, of the number of words per minute written 


1 Nothing is claimed for the value of the composition material used for practice 
purposes in this experiment. Possibly even greater differences could have been 
had in the results of the two groups if material had been used that more nearly 
conforms to what is had by way of letter combinations required for business and 
professional composition; but even with the kind used, the results are very sugges- 
tive of what might be had with the more scientifically selected material. 
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by each subject. The figures, which immediately follow, represent 
words per minute after the exercise in each case had been penalized 
ten words for each error made. 


Group P 





Words per minute for respective measures 







































































| 
Subjects a een eh ee ‘PO —- Finals 
1 2 | iia die 5 | 6 7 | 
| | | | 
1 27.4 | 31.0 | 29.2| 28.6| 37.4| 40.8/| 39.9 | 49.4 
2 48.3| 45.8 | 49.8| 47.4] 49.6] 52.6| 49.33 55.2 
3 | 14.8 | 22.22) 16.2| 17.6] 21.7| 24.0) 24.8 | 26.3 
4 | 22.6} 25.6 | 23.4] 18.2] 25.8] 27.4] 30.8 | 37.1 
5 | 44.1| 49.0 | 43.2! 41.0] 47.0| 47.2] 42.0 | 56.8 
6 | 33.4| 36.6 | 37.4| 25.2] 28.6| 35.6 | 30.8 | 39.4 
7 | 84.9] 35.6 | 38.4] 36.8] 37.4] 45.6] 35.6 | 46.6 
8 | 33.3] 29.8 | 29.8) 24.2] 17.2] 37.2 | 31.1 | 43.0 
9 | 13.3) 15.5 | 21.4] 17.8) 13.8) 13.8] 16.2 | 20.6 
10 =| 28.9} 36.2 | 31.0] 28.4] 32.1} 36.2| 33.1 | 39.5 
11 | 33.1 | 33.6 | 37.4] 25.0] 33.1 | 45.0] 38.8 | 40.6 
12 | 30.2 | 21.4 | 21.4} 20.2} 33.9] 29.7] 23.7 | 43.2 
13 =| 20.9| 27.2 | 23.2] 17.6] 21.3] 25.4| 30.1 | 34.6 
14 | 22.1 | 31.2 | 23.1 | 23.2 | 26.3 | 24.2/ 28.8 | 30.3 
15 | 34.9] 45.2 | 37.2| 32.0| 31.7] 34.0| 41.4 | 51.8 
Medians | 30.2 | 31.2 | 29.8| 25.2| 32.1] 35.6 | 33.1 | 40.6 
Group W 
Words per minute for respective measures | 
Subjects aan — —— srnaradr fa | Finals 
1 2 3 | 4 5 6 7 
1 18.2} 22.0} 18.6! 19.6] 18.5| 25.4| .... | 30.0 
2 | 39.6] 33.0| 35.0| 41.7) 41.2| 49.4] .... | 48.00 
3 | 20.4| 17.0] 22.0] 23.8| 31.4] 24.8 ae) A 
4 This student would not use touch system 
5 50.4| 69.4| 45.0| 60.1| 56.0| 66.0| 55.9| 70.2 
6 28.7| 24.8; 30.6] 32.5} 29.0| 32.9] 43.9| 37.2 
7 32.6} 32.0| 32.4} 29.0) 32.9, 40.6] 57.4| 46.0 
s 45.3| 52.2) 52.0| 48.6) 45.2| 61.6| 60.0/ 59.0 
9 VE hie Bite, 6 cals buat 33.4 
10 | | 
11 35.6) 45.0) 34.4 38.2| 35.5| 44.2 | 32.1) 39.8 
12 17.1} 26.0) 19.2, 18.8| 21.4| 33.1] 37.3 | 28.8 
Medians | 32.6) 33.0 32.4 32.5) 32.9) 40.6) 43.9/ 39.8 
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The results here presented might give the impression, at first, 
that the subjects of Group P progressed more rapidly after they come 
to the composition material. This they did for a short time, but the 
final marks show that this did not continue to the end. This suppo- 
sition is all the more incorrect when it is known that cases one, three, 
nine and two started as late as the last few days of November. 
Because of this late start the results in these cases were so irregular or 
missing altogether that they do not do justice to the ‘‘ composition 
method” to include them. Case four of his group could not be induced 
to use the touch method in her work. For this reason the results in 
this case are not included. Case, number ten, was dropped from the 
course on account of physical inability to acquire this skill. 

The mortality in the work of typewriting for the two groups was 
decidedly greater for the cases in Group W; but it should be kept 
in mind that this group was made up of students who had already 
registered for a full course before taking up the typewriting. This 
made the work of this group very irregular. 


CuRVES OF ACQUISITION AND OF * Errors, Group W 





Case No. 11 








Case No. 7 





Case No. 3 





Case No. 2 





Case No. 6 





Case No. 5 








It was impossible to obtain curves representative of the early 
learning of Group P that could, in any way, be compared with the 
ones here presented. This was due to the nature of the exercise 
material used, and since the two situations represent great differences 
in this matter it is needless to present such curves here. 

The curves here presented are very much like those naneenied 
by other investigators of learning in this and in other fields, except 
that in the acquisition curves there is an absence of the initial rapid 
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rise, although there seems to be a rather abrupt upward direction 
about the fourth day. This is about the time most of them had 
mastered the keyboard sufficiently to begin doing copy work. By 
this time the student had worked himself out from under what, to 
him, had seemed an overwhelming difficulty. 

Each student being taught by this “larger unit method” gave 
evidence of discouragement in her first few attempts at using the 
typewriter. This was hard to overcome. The complex of finding 
the proper key with the proper finger in the proper order by means 
of looking upon a chart, composing a sentence, keeping in mind the 
mechanics of composition construction, using back spacer, spacing 
properly, shifting carriage and attempting numerous other matters 
provides a situation which common sense has never been known to 
attack as a whole. This is what is asked of the student by this 
method of learning. It is likely that this overwhelming complexity 
is the factor responsible for the situation of ‘‘ piece meal’ in most of 
our work of teaching and learning. 

This same practice of “piecemeal” is had in much of the work 
of secondary schools and in higher institutions. This is likely the 
situation for much of what is demanded as prerequisities. It might 
be found to be very economical to get the necessary mathematics 
for the later work of physics at the same time that the physics is 
given. It would be better to give the two together thus involving 
only the mathematics necessary to the other work. The same is 
likely true for the helps in English or any other subjects of curricula. 
Some such change as this is necessary before the arrangement of 
courses can ever conform to “beginning by doing a thing as nearly 
as possible in the way it is eventually to be done.” It seems that 
at some future time the content of all work of formal education will 
conform much more closely to the actual life outside than it has ever 
done and that the method used in teaching will approach much more 
closely the ways of doing the thing after it is learned. 











NOTES ON ARTICLES IN EDUCATIONAL 
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INTELLIGENCE TESTS 


Army Alpha and Student's Grades, Illustrating the Value of the Regression 
Equation. Homer Davis. School and Society, 1921, September, 223-227. 
Results of an experiment conducted at Stanford University to determine the value 
of the Alpha intelligence test as a means of predicting the type of work students 
would do; several case studies; advantages of the plan. 

Criteria for the Regrading of Schools. James L. Stockton, Corinne Davis and 
M. Alice Cronin. The Elementary School Journal, 1921, September, 55-66. 
A program for efficient grading based on a central criterion of mental age supple- 
mented by criteria of (a) chronological age; (b) physical age; (c) pedagogical age; 
(d) character age. Case studies and results of such a program in the Training 
School of San Jose State Normal School. 

On the Need for Caution in Establishing Race Norms. Ada H. Arlitt. Journal 
Applied Psychology, 1921, June, 179-183. Report of an investigation to determine 
the relative influence of race and social status on the distribution of intelligence. 
Social status factor very important. 

Estimating Intelligence by Means of Printed Photographs. LL. Dewey Anderson. 
Journal Applied Psychology, 1921, June, 152-155. Results of an investigation to 
determine the reliability of photographs for indicating the intelligence of strangers. 
Correlation between assigned ratings and intelligence 27. 

Two Cases Showing Marked Change in 1Q. W. T. Root. Journal Applied 
Psychology, 1921, June, 156-158. Details of 2 Binet retests which showed distinct 
increase in IQ. Discussion of contributing factors. 

The Predictive Value of Short Intelligence Tests. C. F. Hansen, M. J. Ream. 
Journal Applied Psychology, 1921, June, 184-186. The reliability and predictive 
value of short intelligence tests as determined by an experiment with two groups of 
students in the School of Life Insurance Salesmanship at Carnegie Institute of 
Technology. 

Studies in Mental Tests. J. E. DeCamp. School and Society, 1921, October, 
254-258. Results of testing Pennsylvania State College Freshmen with Army 


Alpha, Thurstone IV, and Stanford-Binet. Unreliable for predicting quality of 
collegiate work. 


GIFTED CHILDREN 


Gifted Pupils in the High School. John C. Almack and James L. Almack. 
School and Society, 1921, September, 223-228. Conclusions as to number of 
gifted children in high schools; the best means of discovering them; their physical 
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superiority; need for educational reorganization in their favor; and their superior 
social status; based on an investigation of the six upper grades of the Eugene, 
Oregon, schools. 

Preliminary Report on a Gifted Juvenile Author. Lewis M. Terman and Jessie C. 
Fenton. Journal Applied Psychology, 1921, June, 162-178. The history of 
Betty Ford. IQ by the Stanford-Binet—188; by the Army Beta Test—175. 


MEASUREMENT OF PERSONAL TRAITS 


A Preliminary Study of the Correlations Between Estimates of Volitional Traits 
and the Results from the Downey ‘‘ Will—Profile.”” G.M.Ruch. Journal of Applied 
Psychology, 1921, June, 159-162. Actual test scores of more than twenty graduate 
students on Downey “Individual Will—Temperament Test’? compared with 
estimates for the same group secured from two groups of associates—university 
instructors and students in the same classes. 

The Measurement of Aggressiveness. H.T. Moore, A. R. Gilliland. Journal of 
Applied Psychology, 1921, June, 97-118. Description of three tests for measuring 
aggressiveness; results of the use of the tests in an experiment at Dartmouth. 


TRANSFER AND LEARNING 


First Year Latin and Growth in English Vocabulary. W.L. Carr. School and 
Society, 1921, September, 192-198. Description of an experiment conducted in 
7 schools to determine the effect of one year of Latin on a pupil’s “‘ passive”’ English 
vocabulary; results show Latin to be a definite aid. 

What is the Disciplinary Value of the Classics? Thaddeus L. Bolton. School 
and Society, 1921, September, 205-210. An analysis of the term “mental dis- 
cipline;”’ the classics as a tool with which to develop natural capacity; the possi- 
bility of using other tools for the same purpose. 

The Feebleminded Blind. Leila Holterhoff. School and Society, 1921, Septem- 
ber, 174-179. Anexperiment in teaching the mentally defective blind; the need for 
special classes for such children in our schools and institutions. 


Tests FOR SPECIAL ABILITIES 


A Study in Industrial Psychology. Tests for Special Abilities. Elsie O. 
Bregman. Journal of Applied Psychology, 1921, June, 127-151. Report of an 
investigation to develop tests for special ability—sales-clerks and clerical workers 
in a large department store. 











NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


mi EDUCATION ~~ 


1. An Advanced Text in the Psychology of Learning.—In the opening 
sentence Pyle states that; ‘‘In this book I have tried to state every- 
thing that is known about learning.”! The writer has kept to his 
purpose and the product is a veritable encyclopedia, crammed with 
tables and graphs. If the reader expects to find it wearisome, he will 
be pleasantly surprised; Pyle has succeeded, as usual, in making heavy 
facts attractive. It is not however a book for lazy reading. 

The author adopts at the start the Situation—Response hypothesis, 
and in many respects his system is similar to that of Thorndike; but he 
does not always succeed in being consistent with the underlying logic 
of his position, for example: a tendency to superimpose attention, 
attitudes, etc. as active forces upon the mechanics of learning. 

In the crucial problem of the transfer of training, Pyle states essen- 
tially the view presented in Thorndike’s treatise, but opposes Thorn- 
dike on the equally crucial problem of the organization of the mind. 
In the former, the author points out that what are usually called atti- 
tudes, ideals, attention, methods of attack, etc., fall within the hypoth- 
esis of identical elements. In considering the theories of Spearman and 
Thorndike relative to the general intellectual factor, the author, while 
admitting the insufficiency of evidence, is “inclined to believe that 
there is a general learning factor and also a general intellectual factor, 
a factor operative in all intellectual processes.” 

One chapter is devoted to a discussion of drill. After reviewing 
the evidence, the author states his position without equivocation: 
‘“The experiments leave no doubt of the great value of specific drill, of 
direct practice. There is no reason to beat about the bush, evade or 
come at it indirectly. I must know exactly what the skill is, have 
some good reason for desiring it, then I should practice it vigorously, 
regularly, directly.” 

Completing the conventional topics, the three last chapters are 
devoted to fatigue, the relation of instinctive traits to learning, and 
illustrations of certain statistical procedures. The instructor who 














1 Pyle, William Henry: “The Psychology of Learning.” Baltimore: Warwick 
and York, 1921, pp. 308. 
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uses the book will appreciate the class exercises which follow each 
chapter and the extensive bibliographies. It is a very useful, complete 
and comprehensive book which will serve admirably as a text for a 


course following an introductory survey of Educational Psychology. 
A. I. G. 





2. An Important Study of the Physical Growth of Children from 
Birth to Maturity.—Every student of mental and physical development 
has appreciated the uncertainty of conclusions based on averages for 
different ages, and has deplored the lack of repeated measurements on 
the same individuals over a considerable period of time. This lack for 
physical growth Baldwin! has supplied in an important and almost 
monumental study. A similar study in mental growth is promised and 
it will be awaited with great interest. 

The monograph gives, in Parts IV, V, and VI, an historical survey 
of 911 investigations in physical growth in this country and abroad, 
643 comparative tables of measurements of infants, pre-school children, 
school children, and adults under thirty years of age, based on 
5,385,400 recorded cases in various countries and a carefully annotated 
bibliography of 911 titles. This gives some notion of the heroic propor- 
tions of the study. It summarizes all that science knows or reasonably 
conjectures on how children grow physically. 

Parts I, II, and III report Baldwin’s own comprehensive data. 
Part I gives a complete description of instruments and technique in 
securing twenty-three standard measurements with illustrations of 
preliminary work done under the auspices of the Iowa Welfare Station 
in several cities of the state. Part II gives the mean growth in weight 
of white and colored boys and girls from birth to the close of the first 
year with numerous charts giving individual growthcurves. Thecorre- 
lations between weight and birth and various periods up to the 
close of the year are positive but low, especially so at the end of the 
year period. Norms for height, weight, and weight-height index for 
the first year are set up, based on 9074 Iowa infants, and comparisons 
are made with French and German and other American data. 

Norms for pre-school children in height, weight, and weight-height 
index are reported, based on 36,958 Iowa boys and girls between the 
ages of birth and six years. The results are from the extensive study 
by the Federal Children’s Bureau in the Children’s Campaign of 1918. 


1 Baldwin, Bird T.: The Physical Growth of Children from Birth to Matu- 
rity. University of lowa’Studies in Child Welfare, Vol. I, No. 1, pp. 1-411. 
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Perhaps the most interesting and valuable part of the study is 
found in Chapters V and VI which set forth the typical growth histories 
of children between six and seventeen years of age, illustrated by 400 
individual growth curves in height, weight, breathing capacity, sitting 
height, chest girth, strength of right and left arms, and strength of 
upper back. Highly interesting are the individual synoptic profiles of 
growth in fifteen to twenty-two traits. Intercorrelations for the con- 
secutive development of nineteen traits for the years from seven to six- 
teen and for height, weight, and breathing capacity on college girls for 
the years from seventeen to twenty have been worked out for the first 
time. In addition, the total correlations have been analyzed by the 
method of partial correlations. The coefficients of variability tend, on 
the whole, to decrease from seven to seventeen, and, in general, to be 
higher in boys than in girls. The chapters are a veritable mine of 
information and contain scores of important conclusions or generaliza- 
tions which can not be detailed here. 

Anatomical development, measured by radiographs of the wrist 
bones, was determined on sixty-seven boys and girls and correlated 
with height and weight. The correlations are very high. The ana- 
tomical development of disparate twins shows, contrary to the uni- 
versal belief, very marked differences. Physiological age, as evidenced 
by the advent of pubescence or first menstruation, shows wide variations 
and low correlations with other traits. 

With such an array of data it is evident that Baldwin’s study will 
be the standard reference for some time to come, and that every 
student of psychology and education will want to possess it. 

¥. & G BB. 





3. A New Reading Monograph.—The field of children’s interests in 
reading has been well covered in the recently published investigation 
of Arthur M. Jordan, Ph. D.' Previous studies are carefully reviewed 
in the introductory chapters. 

Chapter II deals with the results obtained from the use of a ques- 
tionnaire. Responses were obtained from 3,598 pupils in four cities. 
The tabulations show that the reading interests of boys and girls are 
far from identical and that in both cases some interests increase or 
decline with age. Every effort was made to secure uncensored state- 


1 Jordan, Arthur M.: Children’s Interests in Reading. New York: Teachers 
College, Columbia University Contributions to Education, No. 107, 1921, pp.143. 
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ments and bona fide opinions of pupils and the conclusions drawn from 
the analysis of tabulations are set down in detail. 

The investigator decided that more objective evidence could be 
obtained by observing the choices of children in public libraries and 
reports the results of extended observations in four libraries. The 
interrelation and correlation of these results with those obtained from 
questionnaires and the sale of books are discussed in a brief chapter. 

L. Z. 


4. Two Noteworthy Psychological Contributions Resulting from War 
Work.—(a) Among the psychological by-products of the war may be 
listed the recent volume on The Scientific Measurement of Trade Pro- 
ficiency.! The extensive research financed by the government, under 
emergency conditions, facilitated classification of skilled personnel 
during the war. That the resulting tests may be a factor in industrial 
readjustment is the hope of the author. The volume shows the 
gradual development of technique in the construction and adminis- 
tration of trade tests: (1) the oral test; (2) the picture test; (3) the 
performance test; (4) the written test. The discussion is non-technical 
and readable. The principles underlying the selection of questions 
and the objective scoring of results are stated. The reasons given 
for the discarding of certain types of questions, make for a better 
understanding of the objectivity of such measuring devices. Many 
trades are covered and the carefully standardized tests submitted 
should not only function in the solution of personnel problems in 


industry but also demonstrate the practical value of a thoroughgoing 


scientific attack on the psychological phases of industrial problems. 
(b) The work of the Psychology Committee of the National 
Research Council is also represented by the volume on army mental 
tests? published with the authorization of the War Department. The 
thoroughgoing nature of the work done by the co-operating psycholo- 
gists in the construction and standardization of tests is further 
exemplified in the tabulations and graphical representations of Chapter 
II. Mental tests demonstrated their value in the selection of men for 
officers’ training schools and other lines of service requiring special 
ability. They also facilitated the prompt recognition of men of 


1 Chapman, J. Crosby: “Trade Tests. The Scientific Measurement of Trade 
Proficiency.’”’ New York: Henry Holt and Company, 1921, pp. IX + 436. 

2 Yoakum, Clarence S. and Yerkes, Robert M.: “Army Mental Tests.”’ New 
York: Henry Holt and Company, 1920, pp. XIII + 303. 
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extremely low intelligence. The military importance of segregating 
these men is obvious. 

‘The Examiner’s Guide”’ used during the war is included as Chapter 
III and is followed by a report of tests given in Students’ Army Train- 
ing Corps with tabular comparisons of the results with those obtained 
in various educational institutions. 

Chapter V deals with practical applications. Aside from the tre- 
mendous significance of ‘‘Mental Engineering During the War” 
the tests comprised in Chapter VI throw light on numerous educational 
and industrial situations. The services of the Committee were cut 
short by the signing of the armistice but not before demonstrating (1) 
the application of the principles of psychology to concrete military prob- 
lems (2) the importance of co-operation in practical scientific service. 

L. Z. 





5. Psychology Applied to Business..—This is a book on the psy- 
chology of buying and selling with little of direct interest to the 
educational psychologist or teacher. It is written for the most part 
in a simple and popular style, describing the psychological factors 
that are of importance in influencing people to buy. There are 
numerous examples and applications taken from the realm of business, 
but there are also several others taken from the standard psychological 
literature, some of which seem a little removed from the main purpose 
of the book. In describing unconscious memory we meet again Cole- 
ridge’s servant girl, which recalls the passage quoted by James and 
thereafter by many others. The author has made good use of what he 
calls the historical method in advertising, which is a measure of certain 
trends as illustrated by the change in percentages of different types of 
advertisements over a number of years. His use of this method is 
very ingenious, as, for example, when he obtains an indirect measure 
of truthfulness in advertising by counting the number of superlatives 
used (‘‘best,” “finest,” etc.) over a number of years. The counting and 
charting of various items over a period of years might in like manner | 
prove useful in measuring trends in educational procedure. Alto- 
gether it is an interesting and stimulating little book, and it is simple 
and direct enough to appeal to the circle of readers for which it was 
written. 


R. PINtTNER. 


' Kitson, H. D.: “The Mind of the Buyer. A Psychology of Selling.’”’ Macmil- 
lan, 1921, pp. X + 211. 
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6. Tests for Guidance in the High School.'—This is the first of a 
series of educational monographs planned by the Journal of Educa- 
tional Research and edited by Dr. Buckingham, the editor of the 
Journal. He is to be congratulated upon the excellence of this first 
monograph, which has certainly set a high standard for future authors. 

Dr. Proctor set himself the task of finding out to what extent 
mental tests might be useful for the guidance of high school pupils 
and he has dealt with the subject in a very sound and sensible manner, 
showing what he believes to be their value at the present time and 
making no inordinate claims for them. Tests alone will not solve 
all the problems of vocational advice and direction. They are to be 
used along with teachers’ estimates of ability, records of school suc- 
cess, the vocational ambitions of the pupil, and, the author might very 
well have added, common sense. How such a combination may work 
is shown by the comparison of ‘‘guided” and “unguided” pupils, 
wherein we note the fewer number of failures in the guided group. 
Working from the army data as to intelligence and occupations, sug- 
gestions for vocational guidance, mainly of a negative kind are made. 
By following up the pupils through high school and on to college, 
the author has shown the selective influence at work. The median 
IQ for first-year high-school pupils is 105; for high-school graduates 
111; for college entrants 116. 

The symmetry of the monograph is somewhat marred by raising 
the question of the relation of a particular test to a particular subject, 
working this out for one test and one subject, and then leaving the 
matter hanging in the air. It is obvious that we must have correla- 
tions of each test with English and with other subjects before we can 
make any statement at all as to the value of the analogies test, which is 
the one the author has chosen to correlate. We are left wondering 
why the author suddenly stopped. 

Of great value and interest are the mental age norms for the Army 
Alpha Scale. They differ at certain ages considerably from those 
given by the army workers. The army mental ages were derived 
a group of adults tested on the Stanford. The norms in this book are 
based upon ‘‘several thousand California school children,’”’ and should 
be more reliable than the army mental ages, which are absolutely 
conditioned by what adults can achieve on the Stanford Scale. ‘It isa 


1 Proctor, W. M.: Psychological Tests and Guidance of High School Pupils. 
J. of Ed. Research Monographs, No. 1, June, 1921, Public School Publishing 
Company, pp. 70. 
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pity that an age distribution of the scores of all the cases tested was 
not included in the book. 

This monograph is a valuable addition to the ever-growing library 
of books on tests, and should be in the hands of all those interested in 
mental examination and guidance of high school pupils. But why 
should the author append to this scientific piece of work a description 
of the tests written in the style of a publisher’s advertisement, for 
example, ‘‘the only test yet published . . . ;” ‘‘ten well-selected 
tests,’ etc. etc.? The tests described do not need it. 

R. PINTNER. 


7. The Influence of Work and External Conditions upon Mental and 
Motor Efficiency.—In a monograph of 95 pages, Mr. Peaks! has sum- 
marized the important studies of the influence of time of year, time of 
day, weather, heat, humidity, etc., upon efficiency, considering the 
facts with regard to alleged long and short types of periodicity, 
together with a discussion of methodology and a presentation of original 
data. The author believes that the evidence favors a yearly rhythm 
marked by a depression mid-winter with maxima in Spring and Fall 
and a diurnal variation marked by an increase from morning to mid- 
afternoon usually a temporary depression at noon, but that weekly, 
twenty-three day, twenty-eight day and other alleged types of periodi- 
city are not regularly found. Sufficient information was not available 
to enable the author to disentangle the multitude of possible causal 
factors. Fatigue, light, humidity, temperature, atmospheric pressure, 
meals, etc., are considered in turn, but none of these seem to seri- 
ously influence mental efficiency. The relative insusceptibility of the 
mechanisms involved to disturbance by these forces which do greatly 
affect our feelings of fitness make an interesting chapter in psychology 
which is written in convenient form in this monograph. 

a. &. 





8. A Book for Teachers of Geography.—The following approximation 
to a definition of the term ‘ project” is given in a new book on the use 
of problems and projects in Geography:? ‘‘ Projects consist in doing 
what pupils think it worth while todo. By means of them the subject 
of Geography is vitalized, because projects involve the active and 


1 Peaks, Archibald G.: Periodic Variations in Efficiency. Educational Psy- 
chology Monographs, No. 23, Baltimore: Warwick and York, 1921, pp. 95. 

2Smith, E. Ehrlich: “Teaching Geography by Problems.’’ Garden City: 
Doubleday, Page and Company, 1921, pp. XIX + 306. 
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motivated participation of the pupils in carrying them to successful 
conclusions. . . . It is the real need for objective illustration that 
makes the projects vital.”’ 

The purpose of the book as stated in the foreword is to help teachers 
“by assisting to vitalize the subject of geography.’’ The author notes 
recent progress in the field and discusses the increasing significance of 
human geography in the solution of the problems of civilization. After 
a discussion of past and present school practice, he outlines progressive 
tendencies, lists necessary materials and discusses desirable procedures 
in the selection and solution of problems. Over one hundred pages are 
then given to illustrate problems. The appendix contains, among 
other things, valuable information concerning the accessibility of 
illustrative materials. 

The author is evidently of the opinion that geography projects, as 
such, should have a definite place in the program. The following 
quotations show his position with reference to the reconstruction of the 
curriculum and the source of problems: ‘‘ Teachers will do well to make 
an inventory of the work of the grade which the course of study 
demands, by laying out their term’s problems and projects.”. . . 
“Thoughtful and extensive reading forms the basis for teaching by 
problemsand projects. Out ofthis experience teachers become prepared 
to construct problems.” . . . “From a well selected bibliography, 
a teacher can construct a vast number of interesting problems.” 

The purpose of the bookis manifestly ameliorative. It is addressed 
to classroom teachers, and does not go back to fundamental philosophic 
and scientific considerations upon which more thoroughgoing re- 
construction must depend. 

The author’s purpose would be better served by the omission of 
irrelevant qualifying phrases and indirect implications. The sentence 
structure is often awkward and unless the reader stops to put the 
thought elements in more psychological sequence, the import of state- 
ments is lost. The following quotations are in point: “By 1880, for 
example, the factory had gained precedence over farming in England, 
even though at that time her farmers were making excellent wages, 
for the simple reason that means and methods of communication 
abroad so improved that she could obtain food supplies from the 
outside world cheaper than she could produce these at home.” A 
book designed to interpret new movements and encourage teachers to 
adopt progressive methods, deserves more careful psychological editing. 

L. Z. 
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